超级量化：迈向较小，更准确的模型

论文标题

超级量化：迈向较小，更准确的模型

Hyperspherical Quantization: Toward Smaller and More Accurate Models

论文作者

Liu, Dan, Chen, Xi, Ma, Chen, Liu, Xue

论文摘要

模型量化可以在资源约束设备下部署深层神经网络。向量量化旨在通过将模型权重用完整精确的嵌入（即代码字）来降低模型大小，而在计算过程中，索引需要恢复为32位。二进制和其他低精度量化方法可以将模型尺寸降低到32 $ \ times $，但是，以相当准确的降低为代价。在本文中，我们提出了一个有效的三元量化框架，以产生较小，更准确的压缩模型。通过整合超级学习，修剪和重新定性，我们提出的超级量化方法（HQ）方法可以减少全精确量和三元重量之间的余弦距离，从而减少了在三元量化过程中直通梯度估计量的偏差。与现有的压缩水平（$ \ sim $ 30 $ \ times $，$ \ sim $ 40 $ \ times $）相比，我们的方法显着提高了测试准确性并降低了型号的大小。

Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the index needs to be restored to 32-bit during computation. Binary and other low-precision quantization methods can reduce the model size up to 32$\times$, however, at the cost of a considerable accuracy drop. In this paper, we propose an efficient framework for ternary quantization to produce smaller and more accurate compressed models. By integrating hyperspherical learning, pruning and reinitialization, our proposed Hyperspherical Quantization (HQ) method reduces the cosine distance between the full-precision and ternary weights, thus reducing the bias of the straight-through gradient estimator during ternary quantization. Compared with existing work at similar compression levels ($\sim$30$\times$, $\sim$40$\times$), our method significantly improves the test accuracy and reduces the model size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题