通过不精确的Prox加速原始二重式算法的局部梯度方法的通信加速

论文标题

通过不精确的Prox加速原始二重式算法的局部梯度方法的通信加速

Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with Inexact Prox

论文作者

Sadiev, Abdurakhmon, Kovalev, Dmitry, Richtárik, Peter

论文摘要

受到Mishchenko等人（2022）的最新突破的启发，他们首次表明局部梯度步骤可以导致可证明的通信加速，我们提出了一种替代算法，该算法获得了与他们的方法相同的通信加速度（Proxsskip）。 Our approach is very different, however: it is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications: i) we allow for an inexact computation of the prox operator of a certain smooth strongly convex function via a suitable gradient-based method (e.g., GD, Fast GD or FSFOM), ii) we perform a careful modification of the dual update step in order to retain linear convergence.我们的一般结果为强凸孔座鞍点问题提供了新的最先进的速率，其双线性耦合为特征，其特征是双重功能缺乏平滑度。当应用于联邦学习时，我们获得了Proxskip的理论上更好的替代方法：我们的方法需要更少的本地步骤（$ O（κ^{1/3}）$或$ O（κ^{1/4}）$，与$ o（κ^{1/2}）$相比，PROCXSKIP的$（κ^{1/2}）$ of Proxskip），并执行确定的数字。像Proxskip一样，我们的方法可以应用于连接的网络的优化，我们在这里也获得了理论改进。

Inspired by a recent breakthrough of Mishchenko et al (2022), who for the first time showed that local gradient steps can lead to provable communication acceleration, we propose an alternative algorithm which obtains the same communication acceleration as their method (ProxSkip). Our approach is very different, however: it is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications: i) we allow for an inexact computation of the prox operator of a certain smooth strongly convex function via a suitable gradient-based method (e.g., GD, Fast GD or FSFOM), ii) we perform a careful modification of the dual update step in order to retain linear convergence. Our general results offer the new state-of-the-art rates for the class of strongly convex-concave saddle-point problems with bilinear coupling characterized by the absence of smoothness in the dual function. When applied to federated learning, we obtain a theoretically better alternative to ProxSkip: our method requires fewer local steps ($O(κ^{1/3})$ or $O(κ^{1/4})$, compared to $O(κ^{1/2})$ of ProxSkip), and performs a deterministic number of local steps instead. Like ProxSkip, our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.

下载PDF全文

下载文献需遵守相关版权规定

论文标题