训练不断增长和低级现象：超越线性网络

论文标题

训练不断增长和低级现象：超越线性网络

Training invariances and the low-rank phenomenon: beyond linear networks

论文作者

Le, Thien, Jegelka, Stefanie

论文摘要

神经网络训练引起的隐性偏见已成为严格研究的话题。在具有适当步长的梯度流量和梯度下降的极限中，已经表明，当人们在线性可分离数据上训练具有逻辑或指数损失的深线性网络时，权重将其收敛到Rank-1矩阵。在本文中，我们将此理论结果扩展到了较宽类别的非线性恢复激活的前馈网络的最后几层，该网络包含完全连接的层和跳过连接。与线性案例类似，证明依赖于特定的局部训练不断的不适当，有时称为对齐，我们表明，在所有培训示例中神经元稳定地激活神经元的一系列一致性，它反映了文献中的经验结果。我们还表明，对于Relu完全连接的层的完整矩阵，这通常是不正确的。我们的证明依靠网络的特定分解为多线性函数和另一个Relu网络，其权重在某个参数方向收敛下是恒定的。

The implicit bias induced by the training of neural networks has become a topic of rigorous study. In the limit of gradient flow and gradient descent with appropriate step size, it has been shown that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank-1 matrices. In this paper, we extend this theoretical result to the last few linear layers of the much wider class of nonlinear ReLU-activated feedforward networks containing fully-connected layers and skip connections. Similar to the linear case, the proof relies on specific local training invariances, sometimes referred to as alignment, which we show to hold for submatrices where neurons are stably-activated in all training examples, and it reflects empirical results in the literature. We also show this is not true in general for the full matrix of ReLU fully-connected layers. Our proof relies on a specific decomposition of the network into a multilinear function and another ReLU network whose weights are constant under a certain parameter directional convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题