通过生成对抗性自我象征从演示中学习，学习类别级别的可推广对象操纵策略

论文标题

通过生成对抗性自我象征从演示中学习，学习类别级别的可推广对象操纵策略

Learning Category-Level Generalizable Object Manipulation Policy via Generative Adversarial Self-Imitation Learning from Demonstrations

论文作者

Shen, Hao, Wan, Weikang, Wang, He

论文摘要

可推广的对象操纵技能对于智能和多功能机器人在现实世界中的复杂场景中工作至关重要。尽管强化学习最近取得了进展，但学习可以处理一类几何多种铰接物体的可推广的操纵政策仍然非常具有挑战性。在这项工作中，我们通过以任务不合时宜的方式模仿学习来解决此类别级别的对象操纵策略学习问题，我们假设没有手工制作的密集奖励，而只是最终的奖励。鉴于这个新颖且具有挑战性的概括政策学习问题，我们确定了几个关键问题，这些问题可能使以前的模仿学习算法失败，并阻碍了看不见的实例。然后，我们提出了几种一般但至关重要的技术，包括从演示中学习的生成性对抗性自我象征学习，歧视者的逐步增长以及对专家缓冲的实例平衡，可以准确地指出和解决这些问题，并可以使类别级别的操纵策略学习受益，无论任务如何。我们对Maniskill基准测试的实验表明，所有任务都有显着的改进，而我们的消融研究进一步验证了每种提出的技术的贡献。

Generalizable object manipulation skills are critical for intelligent and multi-functional robots to work in real-world complex scenes. Despite the recent progress in reinforcement learning, it is still very challenging to learn a generalizable manipulation policy that can handle a category of geometrically diverse articulated objects. In this work, we tackle this category-level object manipulation policy learning problem via imitation learning in a task-agnostic manner, where we assume no handcrafted dense rewards but only a terminal reward. Given this novel and challenging generalizable policy learning problem, we identify several key issues that can fail the previous imitation learning algorithms and hinder the generalization to unseen instances. We then propose several general but critical techniques, including generative adversarial self-imitation learning from demonstrations, progressive growing of discriminator, and instance-balancing for expert buffer, that accurately pinpoints and tackles these issues and can benefit category-level manipulation policy learning regardless of the tasks. Our experiments on ManiSkill benchmarks demonstrate a remarkable improvement on all tasks and our ablation studies further validate the contribution of each proposed technique.

下载PDF全文

下载文献需遵守相关版权规定

论文标题