通过联合分类和多个明确检测类改善对抗性鲁棒性

论文标题

通过联合分类和多个明确检测类改善对抗性鲁棒性

Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes

论文作者

Baharlouei, Sina, Sheikholeslami, Fatemeh, Razaviyayn, Meisam, Kolter, Zico

论文摘要

这项工作涉及深层网络的发展，这些网络对对抗性攻击具有强大的稳定性。最近将联合鲁棒分类检测作为认证的防御机制引入，在该机制中，对抗示例正确分类或分配给了“弃权”类。在这项工作中，我们表明，这样的可证明的框架可以通过扩展到具有多个明确弃义类别的网络而受益，在这些网络中，对抗性示例被自适应地分配给这些示例。我们表明，天真添加多个戒烟类可以导致“模型退化”，然后我们提出一种正则化方法和一种培训方法来抵消这种堕落性，通过促进多个弃用类别的全面使用。我们的实验表明，所提出的方法始终达到有利的标准与良好的验证准确性权衡，优于最先进的算法，用于各种戒除类别的选择。

This work concerns the development of deep networks that are certifiably robust to adversarial attacks. Joint robust classification-detection was recently introduced as a certified defense mechanism, where adversarial examples are either correctly classified or assigned to the "abstain" class. In this work, we show that such a provable framework can benefit by extension to networks with multiple explicit abstain classes, where the adversarial examples are adaptively assigned to those. We show that naively adding multiple abstain classes can lead to "model degeneracy", then we propose a regularization approach and a training method to counter this degeneracy by promoting full use of the multiple abstain classes. Our experiments demonstrate that the proposed approach consistently achieves favorable standard vs. robust verified accuracy tradeoffs, outperforming state-of-the-art algorithms for various choices of number of abstain classes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题