论文标题
通过使用线路搜索来定位随机非负相关梯度投影点,可以自适应地解决学习率
Resolving learning rates adaptively by locating Stochastic Non-Negative Associated Gradient Projection Points using line searches
论文作者
论文摘要
随机神经网络培训中的学习率目前是使用昂贵的手册或自动迭代调整的先验培训确定的。这项研究提出了仅梯度线搜索,以解决神经网络培训算法的学习率。训练期间的随机子采样可降低计算成本,并使优化算法在局部最小值上进步。但是,这也导致不连续的成本功能。在这种情况下,最小化线搜索是无效的,因为它们使用消失的导数(一阶最佳条件),这种衍生物通常在不连续的成本函数中不存在,因此与数据趋势相比,与最小值相反。取而代之的是,我们沿搜索方向将候选解决方案纯粹基于梯度信息,尤其是通过定向衍生物符号从阴性变为正(非负相关梯度投影点(NN-GPP))。仅考虑从负面到正的符号变化始终表示最小,因此NN-GPP包含二阶信息。相反,消失的梯度纯粹是一阶条件,这可能表明最小,最大或鞍点。这种见解允许算法的学习率可靠地解析为沿搜索方向的步长,从而提高收敛性能并消除原本昂贵的超参数。
Learning rates in stochastic neural network training are currently determined a priori to training, using expensive manual or automated iterative tuning. This study proposes gradient-only line searches to resolve the learning rate for neural network training algorithms. Stochastic sub-sampling during training decreases computational cost and allows the optimization algorithms to progress over local minima. However, it also results in discontinuous cost functions. Minimization line searches are not effective in this context, as they use a vanishing derivative (first order optimality condition), which often do not exist in a discontinuous cost function and therefore converge to discontinuities as opposed to minima from the data trends. Instead, we base candidate solutions along a search direction purely on gradient information, in particular by a directional derivative sign change from negative to positive (a Non-negative Associative Gradient Projection Point (NN- GPP)). Only considering a sign change from negative to positive always indicates a minimum, thus NN-GPPs contain second order information. Conversely, a vanishing gradient is purely a first order condition, which may indicate a minimum, maximum or saddle point. This insight allows the learning rate of an algorithm to be reliably resolved as the step size along a search direction, increasing convergence performance and eliminating an otherwise expensive hyperparameter.