为轻质对比模型建立更强的基线

论文标题

为轻质对比模型建立更强的基线

Establishing a stronger baseline for lightweight contrastive models

论文作者

Lin, Wenye, Ding, Yifeng, Cao, Zhixiong, Zheng, Hai-tao

论文摘要

最近的研究报道了针对特殊设计的有效网络（例如Mobilenet和Extistificnet）的自我监管对比学习的性能下降。解决此问题的一种普遍做法是引入预处理的对比教师模型，并使用老师产生的蒸馏信号来训练轻量级网络。但是，在不可用的时候预计教师模型是时间和资源的时间和资源。在这项工作中，我们旨在在不使用预验证的教师模型的情况下为轻质对比模型建立更强的基线。具体而言，我们表明，高效模型的最佳配方与较大模型的最佳配方不同，并且使用与以前的研究相同的训练设置是不合适的。此外，我们在对比学习中观察到一个共同的发行，在这种学习中，正面或负面的观点可能是嘈杂的，并提出了平滑的Infonce损失版本以减轻此问题。结果，对于Mobilenet-V3-Large，我们成功地将线性评估结果从36.3 \％提高到62.3 \％，对于ImaboryNet-b0，ImaboreNet-b0在Imagenet上的有效网络B0 \％从42.2 \％提高到了65.8％，将精度差距缩小到Resnet50的精度差距，$ 5 \ tims $ 5 \ times $ 5 \倍。我们希望我们的研究能够促进轻质对比模型的使用。

Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks, such as MobileNet and EfficientNet. A common practice to address this problem is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher. However, it is time and resource consuming to pretrain a teacher model when it is not available. In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model. Specifically, we show that the optimal recipe for efficient models is different from that of larger models, and using the same training settings as ResNet50, as previous research does, is inappropriate. Additionally, we observe a common issu e in contrastive learning where either the positive or negative views can be noisy, and propose a smoothed version of InfoNCE loss to alleviate this problem. As a result, we successfully improve the linear evaluation results from 36.3\% to 62.3\% for MobileNet-V3-Large and from 42.2\% to 65.8\% for EfficientNet-B0 on ImageNet, closing the accuracy gap to ResNet50 with $5\times$ fewer parameters. We hope our research will facilitate the usage of lightweight contrastive models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题