论文标题
上限:高度精确的稀疏视力模型的相关性修剪
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
论文作者
论文摘要
在建筑设计和训练管道方面的显着改善的驱动下,计算机视觉最近在经典基准测试(例如ImageNet)上的准确性方面经历了巨大的进步。这些高度准确的模型在部署方面具有挑战性,因为它们似乎更难使用诸如修剪等标准技术进行压缩。我们通过引入相关性意识修剪器(CAP)来解决这个问题,这是一个新的非结构化修剪框架,可显着推动最先进的体系结构的可压缩限制。我们的方法基于两个技术进步:一种新的理论上正式的修剪器,可以在修剪过程本身中准确有效地处理复杂的重量相关性,以及用于压缩后恢复的有效填充程序。我们通过广泛的实验对几种现代视觉模型(例如视觉变压器(VIT),现代CNN和VIT-CNN混合动力车)进行了广泛的实验来验证我们的方法,这首先表明可以将它们固定在高稀疏度(例如$ \ geq 75 $%)的高度影响($ \ \ leq leq 1 $%$%相对下降)中。我们的方法也与结构化的修剪和量化兼容,并且可以导致1.5至2.4倍的实际加速,而无需准确的损失。为了进一步展示上限的准确性和可扩展性,我们首次使用它来展示通过自我监督技术训练的极其准确的大型视力模型,也可以通过可忽略的准确性损失来修剪中等稀疏。
Driven by significant improvements in architectural design and training pipelines, computer vision has recently experienced dramatic progress in terms of accuracy on classic benchmarks such as ImageNet. These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Pruner (CAP), a new unstructured pruning framework which significantly pushes the compressibility limits for state-of-the-art architectures. Our method is based on two technical advancements: a new theoretically-justified pruner, which can handle complex weight correlations accurately and efficiently during the pruning process itself, and an efficient finetuning procedure for post-compression recovery. We validate our approach via extensive experiments on several modern vision models such as Vision Transformers (ViT), modern CNNs, and ViT-CNN hybrids, showing for the first time that these can be pruned to high sparsity levels (e.g. $\geq 75$%) with low impact on accuracy ($\leq 1$% relative drop). Our approach is also compatible with structured pruning and quantization, and can lead to practical speedups of 1.5 to 2.4x without accuracy loss. To further showcase CAP's accuracy and scalability, we use it to show for the first time that extremely-accurate large vision models, trained via self-supervised techniques, can also be pruned to moderate sparsities, with negligible accuracy loss.