提高top-k文档检索的增强矢量答案

论文标题

提高top-k文档检索的增强矢量答案

Enhanced vectors for top-k document retrieval in Question Answering

论文作者

Hammad, Mohammed

论文摘要

现代应用程序，尤其是信息检索网络应用程序，涉及“搜索”，因为它们的用例逐渐朝着“回答”模块迈进。事实证明，对话性聊天机器人对用户更具吸引力，请使用答案作为其核心。由于精确的答案在计算上是昂贵的，因此已经开发了几种方法来预取包含答案的数据库中最相关的文档/段落。我们提出了一种不同的方法，可以有效，准确地检索证据文档，以确保不会错过给定用户查询的相关文档。我们这样做是通过分配每个文档（或在我们的案例中段落），一个唯一的标识符，并使用它们来创建可以有效索引的密集向量。更确切地说，我们使用标识符预测相关问题的随机抽样上下文窗口单词，与段落本身相对应。这自然地将通道标识符嵌入了向量空间中，以至于嵌入更接近问题而不会损害HE信息内容。这种方法可以在〜4毫秒中有效地创建实时查询向量。

Modern day applications, especially information retrieval webapps that involve "search" as their use cases are gradually moving towards "answering" modules. Conversational chatbots which have been proved to be more engaging to users, use Question Answering as their core. Since, precise answering is computationally expensive, several approaches have been developed to prefetch the most relevant documents/passages from the database that contain the answer. We propose a different approach that retrieves the evidence documents efficiently and accurately, making sure that the relevant document for a given user query is not missed. We do so by assigning each document (or passage in our case), a unique identifier and using them to create dense vectors which can be efficiently indexed. More precisely, we use the identifier to predict randomly sampled context window words of the relevant question corresponding to the passage along with the words of passage itself. This naturally embeds the passage identifier into the vector space in such a way that the embedding is closer to the question without compromising he information content. This approach enables efficient creation of real-time query vectors in ~4 milliseconds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题