增加深度会导致过度参数化的卷积网络中的U形测试风险

论文标题

增加深度会导致过度参数化的卷积网络中的U形测试风险

Increasing Depth Leads to U-Shaped Test Risk in Over-parameterized Convolutional Networks

论文作者

Nichani, Eshaan, Radhakrishnan, Adityanarayanan, Uhler, Caroline

论文摘要

最近的工作表明，过度参数化的神经网络中通过宽度提高模型能力会导致测试风险降低。但是，对于神经网络，模型容量也可以通过深度提高，但是了解深度对测试风险的影响仍然是一个空缺问题。在这项工作中，我们证明了过度参数化的卷积网络的测试风险是U形曲线（即单调减少，然后增加），随着深度的增加。我们首先使用Resnet和卷积神经切线内核（CNTK）通过图像分类实验为该现象提供了经验证据。然后，我们提出了一个新型的线性回归框架，以表征深度对测试风险的影响，并表明深度增加会导致线性CNTK的U形测试风险。特别是，我们证明线性CNTK对应于原始空间上的深度依赖性线性变换，并表征了此转换的属性。然后，我们在任意线性转换下分析过度参数的线性回归，并在简化的设置中，可证明其深度最小化了测试风险的每个偏差和差异项。

Recent works have demonstrated that increasing model capacity through width in over-parameterized neural networks leads to a decrease in test risk. For neural networks, however, model capacity can also be increased through depth, yet understanding the impact of increasing depth on test risk remains an open question. In this work, we demonstrate that the test risk of over-parameterized convolutional networks is a U-shaped curve (i.e. monotonically decreasing, then increasing) with increasing depth. We first provide empirical evidence for this phenomenon via image classification experiments using both ResNets and the convolutional neural tangent kernel (CNTK). We then present a novel linear regression framework for characterizing the impact of depth on test risk, and show that increasing depth leads to a U-shaped test risk for the linear CNTK. In particular, we prove that the linear CNTK corresponds to a depth-dependent linear transformation on the original space and characterize properties of this transformation. We then analyze over-parameterized linear regression under arbitrary linear transformations and, in simplified settings, provably identify the depths which minimize each of the bias and variance terms of the test risk.

下载PDF全文

下载文献需遵守相关版权规定

论文标题