对手：对抗性搜索和通过多机构增强学习

论文标题

对手：对抗性搜索和通过多机构增强学习

AdverSAR: Adversarial Search and Rescue via Multi-Agent Reinforcement Learning

论文作者

Rahman, Aowabin, Bhattacharya, Arnab, Ramachandran, Thiagarajan, Mukherjee, Sayak, Sharma, Himanshu, Fujimoto, Ted, Chatterjee, Samrat

论文摘要

远程环境中的搜索和救援（SAR）任务通常采用自主的多机器人系统来学习，计划和执行本地单机器人控制动作，小组原语以及全球以任务为导向的协调和协作的组合。通常，SAR协调策略是由人类专家手动设计的，他们可以远程控制多机器人系统并启用半自治的操作。但是，在连通性有限且通常不可能进行人工干预的偏远环境中，完全自主操作需要分散的协作策略。然而，由于传感器噪声，驱动故障或对代理间通信数据的操纵，分散的协调可能在对抗环境中无效。在本文中，我们提出了一种基于对抗性多代理增强学习（MARL）的算法方法，该方法允许机器人在存在对抗性间际通信的情况下有效地协调其策略。在我们的设置中，多机器人团队的目标是通过最大程度地减少找到目标所需的平均时间来战略性地发现目标。假定机器人对目标位置没有事先了解，并且它们可以随时与相邻机器人的一部分相互作用。基于MARL的分散执行（CTDE）范式的集中式培训，我们利用层次结构的元学习框架来学习动态的团队协调方式，并在复杂的合作竞争性场景下发现新兴的团队行为。我们方法的有效性在具有不同规格的良性和对抗剂，目标位置和代理奖励的原型环境环境集合中得到了证明。

Search and Rescue (SAR) missions in remote environments often employ autonomous multi-robot systems that learn, plan, and execute a combination of local single-robot control actions, group primitives, and global mission-oriented coordination and collaboration. Often, SAR coordination strategies are manually designed by human experts who can remotely control the multi-robot system and enable semi-autonomous operations. However, in remote environments where connectivity is limited and human intervention is often not possible, decentralized collaboration strategies are needed for fully-autonomous operations. Nevertheless, decentralized coordination may be ineffective in adversarial environments due to sensor noise, actuation faults, or manipulation of inter-agent communication data. In this paper, we propose an algorithmic approach based on adversarial multi-agent reinforcement learning (MARL) that allows robots to efficiently coordinate their strategies in the presence of adversarial inter-agent communications. In our setup, the objective of the multi-robot team is to discover targets strategically in an obstacle-strewn geographical area by minimizing the average time needed to find the targets. It is assumed that the robots have no prior knowledge of the target locations, and they can interact with only a subset of neighboring robots at any time. Based on the centralized training with decentralized execution (CTDE) paradigm in MARL, we utilize a hierarchical meta-learning framework to learn dynamic team-coordination modalities and discover emergent team behavior under complex cooperative-competitive scenarios. The effectiveness of our approach is demonstrated on a collection of prototype grid-world environments with different specifications of benign and adversarial agents, target locations, and agent rewards.

下载PDF全文

下载文献需遵守相关版权规定

论文标题