通过特定于任务的元蒸馏对紧凑型模型的学习很少

论文标题

通过特定于任务的元蒸馏对紧凑型模型的学习很少

Few-Shot Learning of Compact Models via Task-Specific Meta Distillation

论文作者

Wu, Yong, Chanda, Shekhor, Hosseinzadeh, Mehrdad, Liu, Zhi, Wang, Yang

论文摘要

我们考虑了一个新的问题的新问题，即紧凑型模型。元学习是几次学习的流行方法。元学习的先前工作通常假定元训练期间的模型体系结构与用于最终部署的模型体系结构相同。在本文中，我们挑战了这一基本假设。对于最终部署，我们通常需要模型很小。但是，小型模型通常没有足够的能力来有效适应新任务。同时，由于元训练通常在服务器上执行元训练，因此我们通常可以访问大型数据集并在元训练期间访问广泛的计算能力。在本文中，我们提出了特定于任务的元蒸馏，同时学习了元学习中的两个模型：大型教师模型和一个小型学生模型。在元训练期间，这两个模型是共同学习的。鉴于在元测试期间的一项新任务，首先将教师模型适应此任务，然后使用改编的教师模型来指导学生模型的适应。改编的学生模型用于最终部署。我们使用模型 - 静态元学习（MAML）在几次图像分类中证明了方法的有效性。我们提出的方法在几个基准数据集上优于其他替代方案。

We consider a new problem of few-shot learning of compact models. Meta-learning is a popular approach for few-shot learning. Previous work in meta-learning typically assumes that the model architecture during meta-training is the same as the model architecture used for final deployment. In this paper, we challenge this basic assumption. For final deployment, we often need the model to be small. But small models usually do not have enough capacity to effectively adapt to new tasks. In the mean time, we often have access to the large dataset and extensive computing power during meta-training since meta-training is typically performed on a server. In this paper, we propose task-specific meta distillation that simultaneously learns two models in meta-learning: a large teacher model and a small student model. These two models are jointly learned during meta-training. Given a new task during meta-testing, the teacher model is first adapted to this task, then the adapted teacher model is used to guide the adaptation of the student model. The adapted student model is used for final deployment. We demonstrate the effectiveness of our approach in few-shot image classification using model-agnostic meta-learning (MAML). Our proposed method outperforms other alternatives on several benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题