论文标题

遗物:检索文学主张的证据

RELIC: Retrieving Evidence for Literary Claims

论文作者

Thai, Katherine, Chang, Yapei, Krishna, Kalpesh, Iyyer, Mohit

论文摘要

人文学者通常提供证据,证明他们以作品的引文形式对文学作品(例如,小说)作出了证据。我们收集了78k文学报价的大规模数据集(遗物),并围绕着批判性分析,并使用它来制定文学证据检索的新任务,其中为模型提供了围绕掩盖引号的文学分析的摘录,并要求从工作中的所有通道中取回引号。解决这项检索任务需要对复杂的文学和语言现象有深入的了解,这证明了对绝大多数依赖词汇和语义相似性匹配的方法的挑战。我们为这项任务实施了一个基于罗伯塔的密集通道检索器,以优于现有的预算信息检索基线。但是,人类领域专家的实验和分析表明,对我们的密集猎犬有很大的改进空间。

Humanities scholars commonly provide evidence for claims that they make about a work of literature (e.g., a novel) in the form of quotations from the work. We collect a large-scale dataset (RELiC) of 78K literary quotations and surrounding critical analysis and use it to formulate the novel task of literary evidence retrieval, in which models are given an excerpt of literary analysis surrounding a masked quotation and asked to retrieve the quoted passage from the set of all passages in the work. Solving this retrieval task requires a deep understanding of complex literary and linguistic phenomena, which proves challenging to methods that overwhelmingly rely on lexical and semantic similarity matching. We implement a RoBERTa-based dense passage retriever for this task that outperforms existing pretrained information retrieval baselines; however, experiments and analysis by human domain experts indicate that there is substantial room for improvement over our dense retriever.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源