论文标题

从文本中学习跨文本实体表示

Learning Cross-Context Entity Representations from Text

论文作者

Ling, Jeffrey, FitzGerald, Nicholas, Shan, Zifei, Soares, Livio Baldini, Févry, Thibault, Weiss, David, Kwiatkowski, Tom

论文摘要

语言建模任务(基于局部上下文预测的单词或单词式)对于学习单词嵌入和短语的上下文表示非常有效。通过观察到将世界知识编写为机器可读知识库或人类可读百科全书的努力倾向于以实体为中心的努力,我们调查了从填充任务中使用填充任务,以从提到这些实体的文本上下文中学习上下文的独立表述。我们表明,对神经模型的大规模训练使我们能够学习高质量的实体表示形式,并且在四个领域中证明了成功的结果:(1)现有实体级键入基准测试,包括对TypeNet的先前工作减少64%的误差(Murty等,2018); (2)一个新颖的几类重建任务; (3)现有的实体链接基准测试,我们在conll-aida上匹配了最先进的情况,而无需链接特定的特征,并在TAC-KBP 2010上获得89.8%的分数,而无需使用任何别名表,外部知识库或域培训数据中的外部知识基础或(4)回答Trivia问题,这是独特地识别实体的。我们的全球实体代表编码苏格兰足球运动员等细颗粒类型类别,可以回答诸如:谁是柏林Spandau监狱的最后一个囚犯?

Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a fill-in-the-blank task to learn context independent representations of entities from the text contexts in which those entities were mentioned. We show that large scale training of neural models allows us to learn high quality entity representations, and we demonstrate successful results on four domains: (1) existing entity-level typing benchmarks, including a 64% error reduction over previous work on TypeNet (Murty et al., 2018); (2) a novel few-shot category reconstruction task; (3) existing entity linking benchmarks, where we match the state-of-the-art on CoNLL-Aida without linking-specific features and obtain a score of 89.8% on TAC-KBP 2010 without using any alias table, external knowledge base or in domain training data and (4) answering trivia questions, which uniquely identify entities. Our global entity representations encode fine-grained type categories, such as Scottish footballers, and can answer trivia questions such as: Who was the last inmate of Spandau jail in Berlin?

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源