论文标题
低维对逆扰动的起源
Origins of Low-dimensional Adversarial Perturbations
论文作者
论文摘要
在本文中,我们启动了对分类中低维对对抗扰动(LDAP)现象的严格研究。与经典设置不同,这些扰动仅限于尺寸$ k $的子空间,该子空间比功能空间的尺寸$ d $小得多。 $ k = 1 $的情况对应于所谓的通用对抗扰动(UAPS; Moosavi-Dezfooli等,2017)。首先,我们考虑在一般规律性条件(包括RELU网络)下进行二进制分类器,并计算任何子空间的愚蠢率的分析下限。这些界限明确强调了愚蠢率对模型的点缘的依赖性(即,在测试点的梯度的输出与其梯度的$ L_2 $规范之比),以及给定子空间与模型W.R.R.T.的梯度的对齐。输入。我们的结果为启发式方法的最新成功提供了有效产生低维对对手扰动的严格解释。最后,我们表明,如果决策区域紧凑,那么它将接受通用的对抗性扰动,其$ l_2 $ norm n narm und und and $ \ sqrt {d} $次$次,比典型的$ l_2 $ l_2 $ norm for a数据点。我们的理论结果通过对合成和真实数据的实验证实。
In this paper, we initiate a rigorous study of the phenomenon of low-dimensional adversarial perturbations (LDAPs) in classification. Unlike the classical setting, these perturbations are limited to a subspace of dimension $k$ which is much smaller than the dimension $d$ of the feature space. The case $k=1$ corresponds to so-called universal adversarial perturbations (UAPs; Moosavi-Dezfooli et al., 2017). First, we consider binary classifiers under generic regularity conditions (including ReLU networks) and compute analytical lower-bounds for the fooling rate of any subspace. These bounds explicitly highlight the dependence of the fooling rate on the pointwise margin of the model (i.e., the ratio of the output to its $L_2$ norm of its gradient at a test point), and on the alignment of the given subspace with the gradients of the model w.r.t. inputs. Our results provide a rigorous explanation for the recent success of heuristic methods for efficiently generating low-dimensional adversarial perturbations. Finally, we show that if a decision-region is compact, then it admits a universal adversarial perturbation with $L_2$ norm which is $\sqrt{d}$ times smaller than the typical $L_2$ norm of a data point. Our theoretical results are confirmed by experiments on both synthetic and real data.