论文标题

从用户反馈中模拟匪徒学习以提取问题回答

Simulating Bandit Learning from User Feedback for Extractive Question Answering

论文作者

Gao, Ge, Choi, Eunsol, Artzi, Yoav

论文摘要

我们通过使用监督数据模拟反馈来从用户反馈中学习学习回答的方法。我们将问题视为上下文匪徒学习,并以减少数据注释的重点分析几种学习场景的特征。我们表明,最初在少数示例上训练的系统可以大大改善用户对模型预测答案的反馈,并且可以使用现有数据集在新域中部署系统而无需任何注释,而是通过用户反馈来实现信息。

We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers, and that one can use existing datasets to deploy systems in new domains without any annotation, but instead improving the system on-the-fly via user feedback.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源