建立和评估通用命名 - 实体识别英语语料库

论文标题

建立和评估通用命名 - 实体识别英语语料库

Building and Evaluating Universal Named-Entity Recognition English corpus

论文作者

Alves, Diego, Thakkar, Gaurish, Tadić, Marko

论文摘要

本文介绍了通用命名实体框架的应用，以生成自动注释的Corpora。通过使用提取Wikipedia数据以及Meta-DATA和DBPEDIA信息的工作流程，我们生成了一个英语数据集，该数据集进行了描述和评估。此外，我们进行了一组实验，以改善精确，召回和F1量的注释。最终数据集可用，并且已建立的工作流可以应用于现有Wikipedia和DBPedia的任何语言。作为未来研究的一部分，我们打算继续改善注释过程并将其扩展到其他语言。

This article presents the application of the Universal Named Entity framework to generate automatically annotated corpora. By using a workflow that extracts Wikipedia data and meta-data and DBpedia information, we generated an English dataset which is described and evaluated. Furthermore, we conducted a set of experiments to improve the annotations in terms of precision, recall, and F1-measure. The final dataset is available and the established workflow can be applied to any language with existing Wikipedia and DBpedia. As part of future research, we intend to continue improving the annotation process and extend it to other languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题