概念瓶颈模型

论文标题

概念瓶颈模型

Concept Bottleneck Models

论文作者

Koh, Pang Wei, Nguyen, Thao, Tang, Yew Siang, Mussmann, Stephen, Pierson, Emma, Kim, Been, Liang, Percy

论文摘要

我们试图学习可以使用高级概念与之相互作用的模型：如果该模型认为X射线中没有骨头刺，它仍然可以预测严重的关节炎吗？如今，最先进的模型通常不支持对诸如“骨刺的存在”之类的概念的操纵，因为它们是经过训练的端到端训练，可以直接从原始输入（例如，像素）到输出（例如关节炎的严重程度）。我们重新审视首先预测训练时提供的概念的经典概念，然后使用这些概念来预测标签。通过构造，我们可以通过编辑其预测的概念价值并将这些更改传播到最终预测来干预这些概念瓶颈模型。在X射线分级和鸟类识别上，概念瓶颈模型通过标准的端到端模型实现了竞争精度，同时可以通过高级临床概念（“骨头马刺”）或鸟类属性（“翅膀颜色”）来解释。这些模型还允许更丰富的人类模型相互作用：如果我们可以在测试时间纠正概念上的模型错误，则准确性会显着提高。

We seek to learn models that we can interact with using high-level concepts: if the model did not think there was a bone spur in the x-ray, would it still predict severe arthritis? State-of-the-art models today do not typically support the manipulation of concepts like "the existence of bone spurs", as they are trained end-to-end to go directly from raw input (e.g., pixels) to output (e.g., arthritis severity). We revisit the classic idea of first predicting concepts that are provided at training time, and then using these concepts to predict the label. By construction, we can intervene on these concept bottleneck models by editing their predicted concept values and propagating these changes to the final prediction. On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models, while enabling interpretation in terms of high-level clinical concepts ("bone spurs") or bird attributes ("wing color"). These models also allow for richer human-model interaction: accuracy improves significantly if we can correct model mistakes on concepts at test time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题