论文标题
以基于转移的先验为指导的查询有效的黑盒对抗攻击
Query-Efficient Black-box Adversarial Attacks Guided by a Transfer-based Prior
论文作者
论文摘要
近年来,对抗性攻击已经进行了广泛的研究,因为它们可以在部署前确定深度学习模型的脆弱性。在本文中,我们考虑了黑框对面的环境,在这里,对手需要在不访问目标模型的梯度的情况下制作对抗性示例。先前的方法试图通过使用替代白框模型的转移梯度或基于模型查询的反馈来近似真正的梯度。但是,现有方法不可避免地会遭受低攻击成功率或查询效率较差,因为很难在高维输入空间中估算具有有限信息的梯度。为了解决这些问题并改善黑盒攻击,我们分别提出了基于偏见的采样和梯度平均的两个先前引导的随机无梯度(PRGF)算法。我们的方法可以同时获得由替代模型的梯度和查询信息的梯度给出的基于转移的先验的优势。通过理论分析,基于转移的先验通过每种方法的最佳系数适当地与模型查询合适地集成在一起。广泛的实验表明,与替代性最新方法相比,我们的两种方法都需要更少的查询来攻击成功率更高的黑盒模型。
Adversarial attacks have been extensively studied in recent years since they can identify the vulnerability of deep learning models before deployed. In this paper, we consider the black-box adversarial setting, where the adversary needs to craft adversarial examples without access to the gradients of a target model. Previous methods attempted to approximate the true gradient either by using the transfer gradient of a surrogate white-box model or based on the feedback of model queries. However, the existing methods inevitably suffer from low attack success rates or poor query efficiency since it is difficult to estimate the gradient in a high-dimensional input space with limited information. To address these problems and improve black-box attacks, we propose two prior-guided random gradient-free (PRGF) algorithms based on biased sampling and gradient averaging, respectively. Our methods can take the advantage of a transfer-based prior given by the gradient of a surrogate model and the query information simultaneously. Through theoretical analyses, the transfer-based prior is appropriately integrated with model queries by an optimal coefficient in each method. Extensive experiments demonstrate that, in comparison with the alternative state-of-the-arts, both of our methods require much fewer queries to attack black-box models with higher success rates.