论文标题
基于变压器的学习优化
Transformer-Based Learned Optimization
论文作者
论文摘要
我们提出了一种新的方法来学习优化的方法,在其中我们使用神经网络代表优化器更新步骤的计算。然后,通过在一组优化任务上训练优化器的参数,以有效地执行最小化的目标。我们的创新是一种新的神经网络体系结构Optimus,用于受经典BFGS算法启发的学习优化器。与BFG中一样,我们将预处理矩阵视为排名范围更新的总和,但使用基于变压器的神经网络与步长和方向共同预测这些更新。与最近的几种基于优化的方法相反,我们的公式允许在目标问题的参数空间的维度上进行调节,同时仍适用于无需再培训的可变维度优化任务。我们在传统上用于评估优化算法以及基于物理学的基于物理学的3D人类运动的视觉重建的现实世界中任务的基准,证明了我们方法的优势。
We propose a new approach to learned optimization where we represent the computation of an optimizer's update step using a neural network. The parameters of the optimizer are then learned by training on a set of optimization tasks with the objective to perform minimization efficiently. Our innovation is a new neural network architecture, Optimus, for the learned optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a preconditioning matrix as a sum of rank-one updates but use a Transformer-based neural network to predict these updates jointly with the step length and direction. In contrast to several recent learned optimization-based approaches, our formulation allows for conditioning across the dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without retraining. We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for the evaluation of optimization algorithms, as well as on the real world-task of physics-based visual reconstruction of articulated 3d human motion.