根据原始输入图像的解开，通过状态表示学习的加速行为者 - 批判性深钢筋学习，以在杂物中进行视觉抓握

论文标题

根据原始输入图像的解开，通过状态表示学习的加速行为者 - 批判性深钢筋学习，以在杂物中进行视觉抓握

Acceleration of Actor-Critic Deep Reinforcement Learning for Visual Grasping in Clutter by State Representation Learning Based on Disentanglement of a Raw Input Image

论文作者

Kim, Taewon, Park, Yeseong, Park, Youngbin, Suh, Il Hong

论文摘要

对于在混乱的环境中存在各种看不见的目标对象的机器人抓握任务，一些基于深度学习的方法已直接使用视觉输入实现了最新的结果。相比之下，当掌握多种物体时，尤其是在从原始图像和稀疏奖励中学习时，参与者批判性的深钢筋学习（RL）方法通常会表现较差。为了使这些RL技术对于基于视觉的掌握任务可行，我们采用了状态表示学习（SRL），在其中我们首先对基本信息进行编码以供随后在RL中使用。但是，典型的表示学习过程不适合提取有关学习抓地力技能的相关信息，因为用于表示的视觉输入（机器人试图在混乱中掌握目标对象）非常复杂。我们发现，基于原始输入图像的解开的预处理是有效捕获紧凑表示形式的关键。这使得RL可以从高度多样化和多样化的视觉输入中学习机器人抓握技能。我们在现实的模拟环境中以不同级别的分离水平来证明这种方法的有效性。

For a robotic grasping task in which diverse unseen target objects exist in a cluttered environment, some deep learning-based methods have achieved state-of-the-art results using visual input directly. In contrast, actor-critic deep reinforcement learning (RL) methods typically perform very poorly when grasping diverse objects, especially when learning from raw images and sparse rewards. To make these RL techniques feasible for vision-based grasping tasks, we employ state representation learning (SRL), where we encode essential information first for subsequent use in RL. However, typical representation learning procedures are unsuitable for extracting pertinent information for learning the grasping skill, because the visual inputs for representation learning, where a robot attempts to grasp a target object in clutter, are extremely complex. We found that preprocessing based on the disentanglement of a raw input image is the key to effectively capturing a compact representation. This enables deep RL to learn robotic grasping skills from highly varied and diverse visual inputs. We demonstrate the effectiveness of this approach with varying levels of disentanglement in a realistic simulated environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题