不健康对话的六个属性

论文标题

不健康对话的六个属性

Six Attributes of Unhealthy Conversation

论文作者

Price, Ilan, Gifford-Moore, Jordan, Fleming, Jory, Musker, Saul, Roichman, Maayan, Sylvain, Guillaume, Thain, Nithum, Dixon, Lucas, Sorensen, Jeffrey

论文摘要

我们提供了一个新的数据集，其中包含大约44000条评论，由人群工人标记。除二进制标签外，每个评论都被标记为“健康”或“不健康”，这可能存在六个潜在的“不健康”的亚属性：（1）敌对；（2）对抗，侮辱，挑衅或拖钓；（3）不屑一顾；（4）屈服或光顾；（5）讽刺；和/或（6）不公平的概括。每个标签还具有相关的置信度评分。我们认为，有必要的数据集，可以基于“不健康的在线对话”的广泛概念进行研究。我们构建了这种类型学，以涵盖大部分的个人评论，这些评论导致不健康的在线对话。对于其中一些属性，这是该量表的第一个公开可用数据集。我们探讨了数据集的质量，提供了一些摘要统计和初始模型，以说明这些数据的实用性，并突出显示进一步研究的局限性和方向。

We present a new dataset of approximately 44000 comments labeled by crowdworkers. Each comment is labelled as either 'healthy' or 'unhealthy', in addition to binary labels for the presence of six potentially 'unhealthy' sub-attributes: (1) hostile; (2) antagonistic, insulting, provocative or trolling; (3) dismissive; (4) condescending or patronising; (5) sarcastic; and/or (6) an unfair generalisation. Each label also has an associated confidence score. We argue that there is a need for datasets which enable research based on a broad notion of 'unhealthy online conversation'. We build this typology to encompass a substantial proportion of the individual comments which contribute to unhealthy online conversation. For some of these attributes, this is the first publicly available dataset of this scale. We explore the quality of the dataset, present some summary statistics and initial models to illustrate the utility of this data, and highlight limitations and directions for further research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题