选择性分类可以扩大各组的差异

论文标题

选择性分类可以扩大各组的差异

Selective Classification Can Magnify Disparities Across Groups

论文作者

Jones, Erik, Sagawa, Shiori, Koh, Pang Wei, Kumar, Ananya, Liang, Percy

论文摘要

选择性分类可以弃用不确定预测的模型，这是一种自然的方法，可以在错误昂贵但弃权符合度的设置中提高准确性。在本文中，我们发现，尽管选择性分类可以提高平均精度，但它可以同时放大人群中各个组之间的现有准确性差异，尤其是在存在虚假相关性的情况下。我们在五个视觉和NLP数据集中始终如一地观察这种行为。令人惊讶的是，增加的弃权甚至可以降低某些组的准确性。为了更好地理解这种现象，我们研究了边缘分布，这在所有预测中都捕捉了模型的信心。对于对称边距分布，我们证明，选择性分类是否单调提高或恶化的准确性是由完全覆盖范围的准确性（即，没有任何弃权）完全确定的，以及分布是否满足我们称为左Log-conCavity的属性。我们的分析还表明，选择性分类倾向于放大全覆盖的精度差异。在我们的分析中，我们训练分布型模型，这些模型在各组之间达到相似的全覆盖精度，并表明选择性分类均匀地改善了这些模型上的每个组。总的来说，我们的结果表明，应谨慎使用选择性分类，并强调培训模型在全部覆盖范围内在各组中表现出色的重要性。

Selective classification, in which models can abstain on uncertain predictions, is a natural approach to improving accuracy in settings where errors are costly but abstentions are manageable. In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations. We observe this behavior consistently across five vision and NLP datasets. Surprisingly, increasing abstentions can even decrease accuracies on some groups. To better understand this phenomenon, we study the margin distribution, which captures the model's confidences over all predictions. For symmetric margin distributions, we prove that whether selective classification monotonically improves or worsens accuracy is fully determined by the accuracy at full coverage (i.e., without any abstentions) and whether the distribution satisfies a property we call left-log-concavity. Our analysis also shows that selective classification tends to magnify full-coverage accuracy disparities. Motivated by our analysis, we train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group on these models. Altogether, our results suggest that selective classification should be used with care and underscore the importance of training models to perform equally well across groups at full coverage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题