在跳过连接和深度优化的归一化层上

论文标题

在跳过连接和深度优化的归一化层上

On skip connections and normalisation layers in deep optimisation

论文作者

MacDonald, Lachlan Ewen, Valmadre, Jack, Saratchandran, Hemanth, Lucey, Simon

论文摘要

我们介绍了一个一般的理论框架，该框架旨在研究深神经网络的梯度优化，其中包括无处不在的架构选择，包括批处理，重量归一化和跳过连接。我们的框架确定了多层损失景观的曲率和规律性，从其组成层角度，从而阐明了归一化层扮演的角色，并跳过了在全球化这些属性中的连接。然后，我们在两个方面演示了该框架的实用性。首先，我们给出唯一的证明，即使仅在无限内存在此类Optima，也可以使用梯度下降到全球Optima，也可以使用梯度下降来训练一类深层神经网络，就像跨凝结成本一样。其次，我们确定了一种新型的因果机制，通过该机制，跳过连接加速训练，我们可以通过MNIST，CIFAR10，CIFAR100和IMAGENET的重新结构进行预测验证。

We introduce a general theoretical framework, designed for the study of gradient optimisation of deep neural networks, that encompasses ubiquitous architecture choices including batch normalisation, weight normalisation and skip connections. Our framework determines the curvature and regularity properties of multilayer loss landscapes in terms of their constituent layers, thereby elucidating the roles played by normalisation layers and skip connections in globalising these properties. We then demonstrate the utility of this framework in two respects. First, we give the only proof of which we are aware that a class of deep neural networks can be trained using gradient descent to global optima even when such optima only exist at infinity, as is the case for the cross-entropy cost. Second, we identify a novel causal mechanism by which skip connections accelerate training, which we verify predictively with ResNets on MNIST, CIFAR10, CIFAR100 and ImageNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题