论文标题

感兴趣的选项:具有兴趣功能的时间抽象

Options of Interest: Temporal Abstraction with Interest Functions

论文作者

Khetarpal, Khimya, Klissarov, Martin, Chevalier-Boisvert, Maxime, Bacon, Pierre-Luc, Precup, Doina

论文摘要

时间抽象是指代理使用控制器行为的能力,该行为在有限的,可变的时间内起作用。选项框架将这种行为描述为由它们启动的一部分状态组成,内部政策和随机终止条件。但是,由于很难从数据中学习,因此随后的选项发现上的许多工作都忽略了启动集。我们通过定义与选项相关的兴趣函数来提供适用于一般函数近似的启动集的概括。我们为利益功能提供了一种基于梯度的学习算法,从而导致了新的兴趣选项 - 批判性架构。我们研究了如何利用利益功能来学习可解释和可重复使用的时间抽象。我们通过离散和连续环境中的定量和定性结果证明了拟议方法的功效。

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源