在不确定性下具有安全屏障证书的深度控制政策中的安全考虑因素

论文标题

在不确定性下具有安全屏障证书的深度控制政策中的安全考虑因素

Safety Considerations in Deep Control Policies with Safety Barrier Certificates Under Uncertainty

论文作者

Hirshberg, Tom, Vemprala, Sai, Kapoor, Ashish

论文摘要

深度机器学习的最新进展已通过加强和模仿学习等方法来解决复杂的感知和控制循环方面的希望。但是，由于诸如部分可观察性和表征神经网络行为的困难之类的问题，保证对这种学识渊博的深层政策的安全性一直是一个挑战。虽然在培训期间已经放置了安全学习的重点，但保证在部署或测试时间的安全性是非平凡的。本文扩展了在轻度假设下，尽管由于感知和其他潜在变量引起的不确定性，但如何使用安全屏障证书来确保安全性。专门针对动态平稳且不确定性具有有限支持的方案，所提出的框架围绕现有的深控制策略，并通过动态评估和修改嵌入式网络的策略来生成安全的操作。我们的框架利用控制屏障功能来创建在不确定性下安全的控制操作的空间，当发现原始操作违反安全约束时，使用二次编程来最小化原始操作以确保它们位于安全设置中。环境的表示是通过欧几里得签名的距离场构建的，然后用来推断动作的安全性并保证向前的不变性。我们在无人驾驶飞机环境中实现了这种方法，并表明我们的方法与仅依赖于模仿学习生成控制动作的基线相比会采取更安全的动作。

Recent advances in Deep Machine Learning have shown promise in solving complex perception and control loops via methods such as reinforcement and imitation learning. However, guaranteeing safety for such learned deep policies has been a challenge due to issues such as partial observability and difficulties in characterizing the behavior of the neural networks. While a lot of emphasis in safe learning has been placed during training, it is non-trivial to guarantee safety at deployment or test time. This paper extends how under mild assumptions, Safety Barrier Certificates can be used to guarantee safety with deep control policies despite uncertainty arising due to perception and other latent variables. Specifically for scenarios where the dynamics are smooth and uncertainty has a finite support, the proposed framework wraps around an existing deep control policy and generates safe actions by dynamically evaluating and modifying the policy from the embedded network. Our framework utilizes control barrier functions to create spaces of control actions that are safe under uncertainty, and when the original actions are found to be in violation of the safety constraint, uses quadratic programming to minimally modify the original actions to ensure they lie in the safe set. Representations of the environment are built through Euclidean signed distance fields that are then used to infer the safety of actions and to guarantee forward invariance. We implement this method in simulation in a drone-racing environment and show that our method results in safer actions compared to a baseline that only relies on imitation learning to generate control actions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题