部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Planning for Sample Efficient Imitation Learning

论文作者

Yin, Zhao-Heng, Ye, Weirui, Chen, Qifeng, Gao, Yang

论文摘要

模仿学习是一类有希望的政策学习算法，它可以摆脱强化学习的许多实际问题，例如奖励设计问题和探索硬度。但是，当前的模仿算法同时努力达到高性能和高环境样本效率。行为克隆（BC）不需要环境之间的相互作用，但它遭受了损害其性能的协变速器问题。对抗性模仿学习（AIL）将模仿学习变成分布匹配的问题。它可以在某些任务上实现更好的性能，但是需要大量环境相互作用。受到RL高效零的成功启发，我们提出了一种基于计划的模仿学习方法的有效性（EI），可以同时实现高环境样本效率和性能。我们在本文中的算法贡献是两倍。首先，我们将AIL扩展到基于MCTS的RL中。其次，我们表明看似不兼容的两类模仿算法（BC和AIL）可以在我们的框架下自然统一，从而享受两者的好处。我们不仅在基于州的DeepMind Control Suite上基于我们的方法，而且基于许多以前的作品发现极具挑战性的图像版本。实验结果表明，EI达到最先进的结果，可以提高性能和样本效率。 EI在基于州和基于图像的任务的有限样本设置中显示出超过4倍的性能增长，并且可以解决诸如类人类的具有挑战性的问题，在这种情况下，以前的方法在少量相互作用的情况下失败了。我们的代码可在https://github.com/zhaohengyin/felficimimitate上找到。

Imitation learning is a class of promising policy learning algorithms that is free from many practical issues with reinforcement learning, such as the reward design issue and the exploration hardness. However, the current imitation algorithm struggles to achieve both high performance and high in-environment sample efficiency simultaneously. Behavioral Cloning (BC) does not need in-environment interactions, but it suffers from the covariate shift problem which harms its performance. Adversarial Imitation Learning (AIL) turns imitation learning into a distribution matching problem. It can achieve better performance on some tasks but it requires a large number of in-environment interactions. Inspired by the recent success of EfficientZero in RL, we propose EfficientImitate (EI), a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously. Our algorithmic contribution in this paper is two-fold. First, we extend AIL into the MCTS-based RL. Second, we show the seemingly incompatible two classes of imitation algorithms (BC and AIL) can be naturally unified under our framework, enjoying the benefits of both. We benchmark our method not only on the state-based DeepMind Control Suite, but also on the image version which many previous works find highly challenging. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency. EI shows over 4x gain in performance in the limited sample setting on state-based and image-based tasks and can solve challenging problems like Humanoid, where previous methods fail with small amount of interactions. Our code is available at https://github.com/zhaohengyin/EfficientImitate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题