论文标题
通过最大安全半径计算评估文本分类的鲁棒性
Assessing Robustness of Text Classification through Maximal Safe Radius Computation
论文作者
论文摘要
神经网络NLP模型容易受到维持原始含义但导致不同预测的输入的小修改。在本文中,我们专注于针对单词替换的文本分类的鲁棒性,旨在确保如果单词被合理的替代方案(例如同义词)替换,则模型预测不会改变。作为鲁棒性的度量,我们对给定输入文本采用了最大安全半径的概念,这是嵌入空间与决策边界的最小距离。由于在实践中计算确切的最大安全半径是不可行的,因此我们通过计算下限和上限来近似它。对于上限的计算,我们将蒙特卡洛树搜索与句法滤波结合使用来分析单词和多个单词替代的效果。通过适应工具中CNN-CERT和POPQORN中实现的线性边界技术的适应,下降计算是为卷积和经常性网络模型实现的。我们评估了四个数据集(IMDB,SST,AG新闻和新闻)的情感分析和新闻分类模型的方法以及一系列嵌入,并提供了鲁棒性趋势的分析。我们还将我们的框架应用于可解释性分析,并将其与石灰进行比较。
Neural network NLP models are vulnerable to small modifications of the input that maintain the original meaning but result in a different prediction. In this paper, we focus on robustness of text classification against word substitutions, aiming to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary. Since computing the exact maximal safe radius is not feasible in practice, we instead approximate it by computing a lower and upper bound. For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions. The lower bound computation is achieved through an adaptation of the linear bounding techniques implemented in tools CNN-Cert and POPQORN, respectively for convolutional and recurrent network models. We evaluate the methods on sentiment analysis and news classification models for four datasets (IMDB, SST, AG News and NEWS) and a range of embeddings, and provide an analysis of robustness trends. We also apply our framework to interpretability analysis and compare it with LIME.