使用可及性分析和多项式分子型通过行动投影证明安全的加强学习

论文标题

使用可及性分析和多项式分子型通过行动投影证明安全的加强学习

Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes

论文作者

Kochdumper, Niklas, Krasowski, Hanna, Wang, Xiao, Bak, Stanley, Althoff, Matthias

论文摘要

尽管增强学习为许多应用产生了非常有希望的结果，但其主要缺点是缺乏安全保证，这阻止了其在安全至关重要的系统中的使用。在这项工作中，我们通过安全盾牌解决了该问题的非线性连续系统，该系统解决了无关紧要的任务。我们的安全防护罩通过将拟议的行动投射到最接近的安全行动中，阻止了加强学习代理的潜在不安全行动。该方法称为动作投影，并通过混合构成优化实现。通过使用多项式扎根构应用参数化的可及性分析来获得动作投影的安全限制，该分析能够准确捕获该动作对系统的非线性效应。与其他最先进的动作投影方法相反，我们的安全屏蔽可以有效地处理输入限制和动态障碍，放松将空间机器人维度纳入安全性限制中，尽管过程噪声和测量误差非常适合高维系统，但我们仍然适用于高维系统，因为我们在几个挑战性的bench系统上都可以保证安全安全性。

While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题