论文标题
通过视觉变压器进行实用的可认证补丁防御
Towards Practical Certifiable Patch Defense with Vision Transformer
论文作者
论文摘要
补丁攻击是对抗性示例中最具威胁性的物理攻击形式之一,可以导致网络通过在连续区域任意修改像素来诱导错误分类。可认证的补丁防御可以保证鲁棒性,即分类器不受补丁攻击的影响。现有的可认证补丁防御牺牲了分类器的清洁精度,并且仅在玩具数据集中获得低认证的准确性。此外,这些方法的清洁和认证准确性仍然明显低于正常分类网络的准确性,这在实践中限制了其应用。为了朝着可实现的认证补丁防御,我们将视觉变压器(VIT)介绍到降级平滑(DS)的框架中。具体而言,我们建议进行渐进的平滑图像建模任务来训练视觉变压器,该任务可以捕获图像的更可区分的局部环境,同时保留全局语义信息。为了在现实世界中有效的推理和部署,我们将原始VIT的全球自我注意结构创新为孤立的频段单位自我注意力。在Imagenet上,我们的方法在2%的面积贴片攻击下达到41.70%的准确性,比以前的最佳方法(26.00%)增加了近1倍。同时,我们的方法达到了78.58%的清洁精度,这非常接近正常的Resnet-101精度。广泛的实验表明,我们的方法可以在CIFAR-10和Imagenet上有效推断出最先进的清洁准确性。
Patch attacks, one of the most threatening forms of physical attack in adversarial examples, can lead networks to induce misclassification by modifying pixels arbitrarily in a continuous region. Certifiable patch defense can guarantee robustness that the classifier is not affected by patch attacks. Existing certifiable patch defenses sacrifice the clean accuracy of classifiers and only obtain a low certified accuracy on toy datasets. Furthermore, the clean and certified accuracy of these methods is still significantly lower than the accuracy of normal classification networks, which limits their application in practice. To move towards a practical certifiable patch defense, we introduce Vision Transformer (ViT) into the framework of Derandomized Smoothing (DS). Specifically, we propose a progressive smoothed image modeling task to train Vision Transformer, which can capture the more discriminable local context of an image while preserving the global semantic information. For efficient inference and deployment in the real world, we innovatively reconstruct the global self-attention structure of the original ViT into isolated band unit self-attention. On ImageNet, under 2% area patch attacks our method achieves 41.70% certified accuracy, a nearly 1-fold increase over the previous best method (26.00%). Simultaneously, our method achieves 78.58% clean accuracy, which is quite close to the normal ResNet-101 accuracy. Extensive experiments show that our method obtains state-of-the-art clean and certified accuracy with inferring efficiently on CIFAR-10 and ImageNet.