在转移学习中进行不充分的预训练模型

论文标题

在转移学习中进行不充分的预训练模型

Towards Inadequately Pre-trained Models in Transfer Learning

论文作者

Deng, Andong, Li, Xingjian, Hu, Di, Wang, Tianyang, Xiong, Haoyi, Xu, Chengzhong

论文摘要

在深度学习时代，培训是一种流行的学习范式，尤其是在注释不足的情况下。从架构的角度来看，已经证明了更好的ImageNet预训练模型，从而通过先前的研究具有更好的转移性向下游任务。但是，在本文中，我们发现在相同的预训练过程中，中间时期的模型（未充分训练）可以在用作特征提取器（FE）时胜过完全训练的模型，而微调（FT）性能仍然随源表现而增长。这表明ImageNet上的TOP-1准确性与目标数据的转移结果之间没有固体正相关。基于FE和FT之间的矛盾现象，更好的特征提取器无法相应地进行微调，我们对SoftMax层之前的特征进行了全面的分析，以提供有见地的解释。我们的发现表明，在预训练期间，模型倾向于首先学习与大奇异值相对应的光谱成分，而在调整调整时，残留成分贡献更多。

Pre-training has been a popular learning paradigm in deep learning era, especially in annotation-insufficient scenario. Better ImageNet pre-trained models have been demonstrated, from the perspective of architecture, by previous research to have better transferability to downstream tasks. However, in this paper, we found that during the same pre-training process, models at middle epochs, which is inadequately pre-trained, can outperform fully trained models when used as feature extractors (FE), while the fine-tuning (FT) performance still grows with the source performance. This reveals that there is not a solid positive correlation between top-1 accuracy on ImageNet and the transferring result on target data. Based on the contradictory phenomenon between FE and FT that better feature extractor fails to be fine-tuned better accordingly, we conduct comprehensive analyses on features before softmax layer to provide insightful explanations. Our discoveries suggest that, during pre-training, models tend to first learn spectral components corresponding to large singular values and the residual components contribute more when fine-tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题