连续时间元学习与正向模式差异

论文标题

连续时间元学习与正向模式差异

Continuous-Time Meta-Learning with Forward Mode Differentiation

论文作者

Deleu, Tristan, Kanaa, David, Feng, Leo, Kerg, Giancarlo, Bengio, Yoshua, Lajoie, Guillaume, Bacon, Pierre-Luc

论文摘要

从基于梯度的元学习方法具有无限小梯度步骤的灵感，我们引入了连续的时间元学习（COMLN），这是一种元学习算法，在该算法中适应遵循梯度矢量场的动力学。具体而言，输入的表示是元学习的，以使特定于任务的线性分类器作为普通微分方程（ODE）的解决方案获得。将学习过程视为ode，具有明显的优势，即轨迹的长度现在是连续的，而不是固定和离散数量的梯度步骤。结果，我们还可以优化使用随机梯度下降来解决新任务所需的适应性量，此外还可以学习基于梯度的元学习的标准实践，而不是学习初始条件。重要的是，为了计算外环更新所需的确切的元梯度，我们根据向前模式差异设计了有效的算法，其内存需求不会随着学习轨迹的长度而扩展，从而允许在恒定内存中更长的适应性。我们为COMLN的稳定性提供了分析保证，我们从经验上显示了其在运行时和内存使用方面的效率，并说明了其在一系列少数图像分类问题上的有效性。

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a task-specific linear classifier is obtained as a solution of an ordinary differential equation (ODE). Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous, as opposed to a fixed and discrete number of gradient steps. As a consequence, we can optimize the amount of adaptation necessary to solve a new task using stochastic gradient descent, in addition to learning the initial conditions as is standard practice in gradient-based meta-learning. Importantly, in order to compute the exact meta-gradients required for the outer-loop updates, we devise an efficient algorithm based on forward mode differentiation, whose memory requirements do not scale with the length of the learning trajectory, thus allowing longer adaptation in constant memory. We provide analytical guarantees for the stability of COMLN, we show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题