最近的邻居分类器的可认证鲁棒性

论文标题

最近的邻居分类器的可认证鲁棒性

Certifiable Robustness for Nearest Neighbor Classifiers

论文作者

Fan, Austen Z., Koutris, Paraschos

论文摘要

通常使用高质量的大型数据集对ML模型进行培训。但是，培训数据集通常包含不一致或不完整的数据。为了解决此问题，一种解决方案是开发算法，可以检查模型的预测是否明显稳健。给定一种学习算法会产生分类器并在测试时间给出一个示例，如果通过在所有可能的世界（不确定）数据集的所有可能的世界（维修）中训练的每个模型（不确定）数据集进行了分类结果。这种鲁棒性的概念自然地属于某些答案的框架。在本文中，我们研究了一个简单但广泛部署的分类算法的鲁棒性的复杂性，$ k $ - neart的邻居（$ k $ -nn）。当完整性约束是功能依赖性（FDS）时，我们的主要重点是不一致的数据集。在这种情况下，我们建立了证明鲁棒性W.R.T.的复杂性的二分法。 FDS集合：问题要么接受多项式时间算法，要么是局限性。此外，我们针对该问题的计数版本展示了类似的二分法，其目标是计算预测某个标签的可能世界的数量。作为我们研究的副产品，我们还建立了与寻找可能具有独立感兴趣的最佳子集修复有关的问题的复杂性。

ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, $k$-Nearest Neighbors ($k$-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题