论文标题

检查:中国covid-19假新闻数据集

CHECKED: Chinese COVID-19 Fake News Dataset

论文作者

Yang, Chen, Zhou, Xinyi, Zafarani, Reza

论文摘要

Covid-19都影响了所有生命。为了保持社会距离并避免曝光,工作和生活已经逐渐在线移动。在这一趋势下,获得Covid-19新闻的社交媒体用法有所增加。同样,关于Covid-19的错误信息经常在社交媒体上传播。在这项工作中,我们开发了检查,这是第一个有关COVID-19错误信息的中国数据集。通过使用特定的关键字列表确定,已检查提供了2,104个与Covid-19相关的验证微博。相应地,检查包括1,868,175个重新播放,1,185,702条评论和56,852,736个喜欢,揭示了这些经过验证的微博如何在微博上传播和反应。该数据集包含每个微博的丰富多媒体信息集,包括地面真相标签,文本,视觉,时间和网络信息。已经进行了广泛的实验,以分析检查数据并在使用检查预测假新闻时为良好方法提供基准结果。我们希望检查可以促进针对冠状病毒错误信息的研究。该数据集可从https://github.com/cyang03/checked获得。

COVID-19 has impacted all lives. To maintain social distancing and avoiding exposure, works and lives have gradually moved online. Under this trend, social media usage to obtain COVID-19 news has increased. Also, misinformation on COVID-19 is frequently spread on social media. In this work, we develop CHECKED, the first Chinese dataset on COVID-19 misinformation. CHECKED provides a total 2,104 verified microblogs related to COVID-19 from December 2019 to August 2020, identified by using a specific list of keywords. Correspondingly, CHECKED includes 1,868,175 reposts, 1,185,702 comments, and 56,852,736 likes that reveal how these verified microblogs are spread and reacted on Weibo. The dataset contains a rich set of multimedia information for each microblog including ground-truth label, textual, visual, temporal, and network information. Extensive experiments have been conducted to analyze CHECKED data and to provide benchmark results for well-established methods when predicting fake news using CHECKED. We hope that CHECKED can facilitate studies that target misinformation on coronavirus. The dataset is available at https://github.com/cyang03/CHECKED.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源