提示向量：以各种背景信号为条件的语言模型的模块化培训

论文标题

提示向量：以各种背景信号为条件的语言模型的模块化培训

CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

论文作者

Novotney, Scott, Mukherjee, Sreeparna, Ahmed, Zeeshan, Stolcke, Andreas

论文摘要

我们提出了一个框架，以模块化神经语言模型的训练，该模型通过消除了共同训练句子句子 - 外部和句子内编码器的需求，使用各种形式的句子 - 外部上下文（包括元数据）。我们的方法是上下文通用嵌入（CUE），在一组上下文（例如日期和作者）上训练LMS，并适应新颖的元数据类型，例如文章标题或以前的句子。该模型由验证的神经句子LM，一个基于BERT的上下文编码器和一个蒙版的变压器解码器，该解码器使用句子内部和句子 - 外部信息估算LM概率。当上下文或元数据不可用时，我们的模型将学习使用嘈杂的甲骨文Unigram嵌入将上下文和句子内部信息组合在一起。可以在稍后引入实际上下文信息，并用于调整少量参数，以将上下文数据映射到解码器的嵌入空间中。我们在NYTIMES文本语料库上验证了具有多种元数据类型的提示框架，通过在上下文下进行条件，可以从36.6.6降低LM困惑。在训练期间，仅使用上下文/元数据的一个子集的上下文LM引导LM保留了可实现的增益的85％。最初，以代理环境训练该模型在适应真实环境后保留了67％的困惑增益。此外，我们只能通过调整解码器模型来将一种验证的句子LM交换为另一种验证的LM，而无需重新训练上下文编码。总体而言，我们获得了一个模块化框架，该框架允许对上下文增强的LMS进行增量，可扩展的训练。

We propose a framework to modularize the training of neural language models that use diverse forms of sentence-external context (including metadata) by eliminating the need to jointly train sentence-external and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one set of context, such as date and author, and adapts to novel metadata types, such as article title, or previous sentence. The model consists of a pretrained neural sentence LM, a BERT-based context encoder, and a masked transformer decoder that estimates LM probabilities using sentence-internal and sentence-external information. When context or metadata are unavailable, our model learns to combine contextual and sentence-internal information using noisy oracle unigram embeddings as a proxy. Real contextual information can be introduced later and used to adapt a small number of parameters that map contextual data into the decoder's embedding space. We validate the CUE framework on a NYTimes text corpus with multiple metadata types, for which the LM perplexity can be lowered from 36.6 to 27.4 by conditioning on context. Bootstrapping a contextual LM with only a subset of the context/metadata during training retains 85\% of the achievable gain. Training the model initially with proxy context retains 67% of the perplexity gain after adapting to real context. Furthermore, we can swap one type of pretrained sentence LM for another without retraining the context encoders, by only adapting the decoder model. Overall, we obtain a modular framework that allows incremental, scalable training of context-enhanced LMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题