从用户反馈中模拟匪徒学习以提取问题回答

论文标题

从用户反馈中模拟匪徒学习以提取问题回答

Simulating Bandit Learning from User Feedback for Extractive Question Answering

论文作者

Gao, Ge, Choi, Eunsol, Artzi, Yoav

论文摘要

我们通过使用监督数据模拟反馈来从用户反馈中学习学习回答的方法。我们将问题视为上下文匪徒学习，并以减少数据注释的重点分析几种学习场景的特征。我们表明，最初在少数示例上训练的系统可以大大改善用户对模型预测答案的反馈，并且可以使用现有数据集在新域中部署系统而无需任何注释，而是通过用户反馈来实现信息。

We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers, and that one can use existing datasets to deploy systems in new domains without any annotation, but instead improving the system on-the-fly via user feedback.

下载PDF全文

下载文献需遵守相关版权规定

论文标题