这一切都在老师中：零拍的量化使老师更加接近

论文标题

这一切都在老师中：零拍的量化使老师更加接近

It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher

论文作者

Choi, Kanghyun, Lee, Hye Yoon, Hong, Deokki, Yu, Joonsang, Park, Noseong, Kim, Youngsok, Lee, Jinho

论文摘要

模型量化被认为是一种有希望的方法，可以大大降低深神经网络的资源需求。为了处理量化错误引起的性能下降，一种流行的方法是使用培训数据来微调量化网络。但是，在现实环境中，这种方法通常是不可行的，因为由于安全性，隐私或机密性，培训数据不可用。零拍摄的量化解决了此类问题，通常是通过从完整的教师网络的权重来弥补量化网络的性能下降来解决的。在本文中，我们首先分析了最先进的零拍量化技术的损失表面，并提供了几个发现。与通常的知识蒸馏问题相反，零量的量化通常受1）共同优化多个损失项的困难，以及2）由于使用合成样本而导致的概括能力不佳。此外，我们观察到，即使有必要这样做以提高性能，许多权重即使在训练量化网络期间都无法跨越圆形阈值。根据观察结果，我们提出了AIT，这是一种简单而功能强大的零量量化技术，该技术通过以下方式解决了上述两个问题：AIT i）仅使用KL距离损失，仅使用KL距离损失，而无需交叉倾斜损失，而II）操纵梯度以确保一定的重量在交叉圆形的圆形上得到适当更新。实验表明，AIT的表现优于许多现有方法的性能，从而接管了该领域的总体最新位置。

Model quantization is considered as a promising method to greatly reduce the resource requirements of deep neural networks. To deal with the performance drop induced by quantization errors, a popular method is to use training data to fine-tune quantized networks. In real-world environments, however, such a method is frequently infeasible because training data is unavailable due to security, privacy, or confidentiality concerns. Zero-shot quantization addresses such problems, usually by taking information from the weights of a full-precision teacher network to compensate the performance drop of the quantized networks. In this paper, we first analyze the loss surface of state-of-the-art zero-shot quantization techniques and provide several findings. In contrast to usual knowledge distillation problems, zero-shot quantization often suffers from 1) the difficulty of optimizing multiple loss terms together, and 2) the poor generalization capability due to the use of synthetic samples. Furthermore, we observe that many weights fail to cross the rounding threshold during training the quantized networks even when it is necessary to do so for better performance. Based on the observations, we propose AIT, a simple yet powerful technique for zero-shot quantization, which addresses the aforementioned two problems in the following way: AIT i) uses a KL distance loss only without a cross-entropy loss, and ii) manipulates gradients to guarantee that a certain portion of weights are properly updated after crossing the rounding thresholds. Experiments show that AIT outperforms the performance of many existing methods by a great margin, taking over the overall state-of-the-art position in the field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题