论文标题
学会概括进行顺序决策
Learning to Generalize for Sequential Decision Making
论文作者
论文摘要
我们考虑做出决策序列以完成任务的问题,并通过语言媒介进行交互。这些问题通常通过加强学习方法解决。我们发现,将这些模型应用于新型任务域时并不能很好地概括。但是,在强化学习范式下,充分训练和探索顺序决策的搜索空间所需的大量计算排除了包含大型上下文化语言模型,否则可能会启用所需的概括能力。我们介绍了一种教师模仿学习方法,并将强化学习模型转换为自然语言理解模型的手段。这些方法共同将上下文化的语言模型引入顺序决策空间。我们表明,模型可以更快地学习和概括,从而利用模仿学习和重新制定。我们的模型在各种持有的决策问题上都超过了教师的表现,在内域问题上最多可以达到7%,而在室外问题上有24%。
We consider problems of making sequences of decisions to accomplish tasks, interacting via the medium of language. These problems are often tackled with reinforcement learning approaches. We find that these models do not generalize well when applied to novel task domains. However, the large amount of computation necessary to adequately train and explore the search space of sequential decision making, under a reinforcement learning paradigm, precludes the inclusion of large contextualized language models, which might otherwise enable the desired generalization ability. We introduce a teacher-student imitation learning methodology and a means of converting a reinforcement learning model into a natural language understanding model. Together, these methodologies enable the introduction of contextualized language models into the sequential decision making problem space. We show that models can learn faster and generalize more, leveraging both the imitation learning and the reformulation. Our models exceed teacher performance on various held-out decision problems, by up to 7% on in-domain problems and 24% on out-of-domain problems.