使用潜在情况进行语言建模

论文标题

使用潜在情况进行语言建模

Language Modeling with Latent Situations

论文作者

Li, Belinda Z., Nye, Maxwell, Andreas, Jacob

论文摘要

语言模型（LMS）通常会产生不连贯的输出：它们是指与输入中描述的世界状态不相容的事件和实体状态。我们介绍了情况，这是一种通过培训实体及其州的明确表示，通过训练和条件来改善LMS的一系列方法。 peruationsupervision有两个组成部分：一个辅助状况建模任务，该任务训练模型以预测上下文中的状态表示，而潜在的状态推理程序将这些状态赋予部分注释的培训数据。可以将satuationsupervision应用于微调（通过监督LMS在其隐藏表示形式中编码状态变量）并提示（通过诱导LMS与输出文本对实体状态的文本描述进行介绍）。在这两种情况下，情况都只需要少量的状态注释来产生重大连贯性的改进（在4-11％之间），这表明标准LMS可以进行样本有效的培训，不仅是模拟语言，还可以对其描述的情况进行建模。

Language models (LMs) often generate incoherent outputs: they refer to events and entity states that are incompatible with the state of the world described in their inputs. We introduce SituationSupervision, a family of approaches for improving coherence in LMs by training them to construct and condition on explicit representations of entities and their states. SituationSupervision has two components: an auxiliary situation modeling task that trains models to predict state representations in context, and a latent state inference procedure that imputes these states from partially annotated training data. SituationSupervision can be applied to both fine-tuning (by supervising LMs to encode state variables in their hidden representations) and prompting (by inducing LMs to interleave textual descriptions of entity states with output text). In both cases, SituationSupervision requires only a small number of state annotations to produce major coherence improvements (between 4-11%), showing that standard LMs can be sample-efficiently trained to model not just language but the situations it describes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题