使用计划进行政策搜索，改善对连续域中深入增强学习的探索

论文标题

使用计划进行政策搜索，改善对连续域中深入增强学习的探索

Improving the Exploration of Deep Reinforcement Learning in Continuous Domains using Planning for Policy Search

论文作者

Hollenstein, Jakob J., Renaudo, Erwan, Saveriano, Matteo, Piater, Justus

论文摘要

大多数深入的强化学习（D-RL）方法可以进行本地政策搜索，这增加了被困在当地最低限度的风险。此外，即使在基于仿真的训练中，模拟模型的可用性也没有完全利用，这可能会降低效率。为了更好地利用策略搜索中的模拟模型，我们建议在探索策略中将运动动力学计划者整合在一起，并以离线环境相互作用的离线方式学习控制政策。我们称基于模型的强化学习方法PPS（策略搜索计划）。我们将PPS与典型的RL设置（包括不足的系统）中的最新D-RL方法进行比较。比较表明，在动力学计划者的指导下，PPS从状态空间的更广泛区域收集数据。这会生成培训数据，可帮助PPS发现更好的政策。

Local policy search is performed by most Deep Reinforcement Learning (D-RL) methods, which increases the risk of getting trapped in a local minimum. Furthermore, the availability of a simulation model is not fully exploited in D-RL even in simulation-based training, which potentially decreases efficiency. To better exploit simulation models in policy search, we propose to integrate a kinodynamic planner in the exploration strategy and to learn a control policy in an offline fashion from the generated environment interactions. We call the resulting model-based reinforcement learning method PPS (Planning for Policy Search). We compare PPS with state-of-the-art D-RL methods in typical RL settings including underactuated systems. The comparison shows that PPS, guided by the kinodynamic planner, collects data from a wider region of the state space. This generates training data that helps PPS discover better policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题