建议通过强化学习剂对人类循环的验证验证

论文标题

建议通过强化学习剂对人类循环的验证验证

Advice Conformance Verification by Reinforcement Learning agents for Human-in-the-Loop

论文作者

Verma, Mudit, Kharkwal, Ayush, Kambhampati, Subbarao

论文摘要

人类在循环（HIL）增强学习中，正在以较大的行动和状态空间的范围内获得吸引力，并通过允许代理商从HIL那里获得建议来稀疏的奖励。除了建议住宿之外，顺序决策代理必须能够表达其能够利用人类建议的程度。随后，代理商应为HIL提供一种检查建议部分必须拒绝以支持整体环境目标的方法。我们介绍了咨询辅助验证的问题，该问题需要加强学习（RL）代理商在循环中对人类的保证，以了解其符合他们的建议。然后，我们提出一个基于树的通用语言，以支持这种通信，称为偏好树。我们研究了两个在穆约科人的人形生物环境中的好和坏建议场景的案例。通过我们的实验，我们表明我们的方法可以通过传达代理人是否使用人类的建议来提供一种可解释的方法来解决咨询符合验证问题。最后，我们与20名参与者进行了人类研究，以验证我们的方法。

Human-in-the-loop (HiL) reinforcement learning is gaining traction in domains with large action and state spaces, and sparse rewards by allowing the agent to take advice from HiL. Beyond advice accommodation, a sequential decision-making agent must be able to express the extent to which it was able to utilize the human advice. Subsequently, the agent should provide a means for the HiL to inspect parts of advice that it had to reject in favor of the overall environment objective. We introduce the problem of Advice-Conformance Verification which requires reinforcement learning (RL) agents to provide assurances to the human in the loop regarding how much of their advice is being conformed to. We then propose a Tree-based lingua-franca to support this communication, called a Preference Tree. We study two cases of good and bad advice scenarios in MuJoCo's Humanoid environment. Through our experiments, we show that our method can provide an interpretable means of solving the Advice-Conformance Verification problem by conveying whether or not the agent is using the human's advice. Finally, we present a human-user study with 20 participants that validates our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题