论文标题

SODAPOP:在社会常识性推理模型中的开放式社会偏见

SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models

论文作者

An, Haozhe, Li, Zongxia, Zhao, Jieyu, Rudinger, Rachel

论文摘要

在NLP模型中检测社会偏见的诊断测试的一个常见局限性是,它们只能检测由测试设计师预先指定的刻板印象关联。由于列举所有可能的有问题的关联是不可行的,因此这些测试可能无法检测到模型中存在但未被设计师预先指定的偏差。为了解决这一限制,我们在社交常识性提出问题中提出了Sodapop(关于人的答案的社会偏见发现)。我们的管道通过(1)替换与不同人群组相关的名称,以及(2)从蒙版语言模型中产生许多干扰器答案,从社交IQA数据集(SAP等,2019)生成了修改实例。通过使用社交常识模型来评分产生的干扰因素,我们可以揭示该模型在人口统计组和一组单词之间的刻板印象关联。我们还测试了sodapop对伪造模型,并显示了多种最先进的词汇算法的局限性。

A common limitation of diagnostic tests for detecting social biases in NLP models is that they may only detect stereotypic associations that are pre-specified by the designer of the test. Since enumerating all possible problematic associations is infeasible, it is likely these tests fail to detect biases that are present in a model but not pre-specified by the designer. To address this limitation, we propose SODAPOP (SOcial bias Discovery from Answers about PeOPle) in social commonsense question-answering. Our pipeline generates modified instances from the Social IQa dataset (Sap et al., 2019) by (1) substituting names associated with different demographic groups, and (2) generating many distractor answers from a masked language model. By using a social commonsense model to score the generated distractors, we are able to uncover the model's stereotypic associations between demographic groups and an open set of words. We also test SODAPOP on debiased models and show the limitations of multiple state-of-the-art debiasing algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源