用于处理长文档的全局内存变压器

论文标题

用于处理长文档的全局内存变压器

Global memory transformer for processing long documents

论文作者

Adel, Arij Al

论文摘要

变压器变体在不同的自然语言处理任务（例如翻译，阅读理解和摘要）中主导最新的。我们的论文更有指示使用添加到输入中的一般记忆插槽并研究添加这些插槽的结果。本文是对一般记忆插槽规则的研究，该规则被添加到了先前工作中所提出的模型的输入中。我们有两个主要任务； 1）使用蒙版语言建模和b）使用HOTPOTQA进行微调任务进行预处理。这项研究旨在验证所提出的模型处理块的能力，就好像它们是与基本模型相比的一部分。作为基线，我们使用了T5变压器。我们研究了每个输入块的内存插槽规则，并在没有选择器的情况下研究了模型性能。我们发现，在输入块中添加内存有助于提出的模型通过特定的培训参数克服蒙版语言建模任务的基线。消融研究揭示了使用压缩的输入块并具有降级的能力。

Transformer variants dominate the state-of-the-art in different natural language processing tasks such as translation, reading comprehension and summarization. Our paper is more directed to use general memory slots added to the inputs and studying the results of adding these slots. This paper is a go on study of general memory slots rule that were added to the input of the proposed model in previous work. We have two main tasks;1) pretraining task using masked language modeling and b) fine tuning task using HotpotQA . This study aims to verify the ability of the proposed model to handle chunks as if they were one chunk comparing with the base model. As baseline we used T5 transformer. We studied the rule of memory slots augmented to each input chunk and studied the model performance without selector. We found that adding memory to input chunks helped the proposed model to overcome the baseline on Masked language modeling task with specific training parameters. Ablation study reveals the ability of using the compressed input chunks with a degradation in performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题