使用线性模型树近似深钢筋学习对接剂

论文标题

使用线性模型树近似深钢筋学习对接剂

Approximating a deep reinforcement learning docking agent using linear model trees

论文作者

Gjærum, Vilde B., Rørvik, Ella-Lovise H., Lekkas, Anastasios M.

论文摘要

深度强化学习导致了机器人技术的许多值得注意的结果。但是，深层神经网络（DNN）并非直觉，这使得由于经济，安全性和保证原因，很难理解其预测，并强烈限制了他们进行现实世界中应用的潜力。为了解决这个问题，已经提出了许多可解释的AI方法，例如Shap和Lime，但是这些方法可能太昂贵了，无法在实时机器人应用中使用，或者仅提供本地解释。在本文中，主要贡献是使用线性模型树（LMT）来近似DNN策略，该策略最初是通过近端策略优化（PPO）训练的，用于具有五个控制输入的自主表面车辆，执行码头操作。提出方法的两个主要好处是：a）LMT是透明的，这使得可以将输出（控制动作，在我们的情况下）与输入特征的特定值联系起来，b）LMT在计算上是有效的，并且可以实时提供信息。在我们的模拟中，不透明的DNN策略控制车辆，LMT并行运行，以功能归因的形式提供解释。我们的结果表明，LMT可以是自动船舶数字保证框架中的一个有用组件。

Deep reinforcement learning has led to numerous notable results in robotics. However, deep neural networks (DNNs) are unintuitive, which makes it difficult to understand their predictions and strongly limits their potential for real-world applications due to economic, safety, and assurance reasons. To remedy this problem, a number of explainable AI methods have been presented, such as SHAP and LIME, but these can be either be too costly to be used in real-time robotic applications or provide only local explanations. In this paper, the main contribution is the use of a linear model tree (LMT) to approximate a DNN policy, originally trained via proximal policy optimization(PPO), for an autonomous surface vehicle with five control inputs performing a docking operation. The two main benefits of the proposed approach are: a) LMTs are transparent which makes it possible to associate directly the outputs (control actions, in our case) with specific values of the input features, b) LMTs are computationally efficient and can provide information in real-time. In our simulations, the opaque DNN policy controls the vehicle and the LMT runs in parallel to provide explanations in the form of feature attributions. Our results indicate that LMTs can be a useful component within digital assurance frameworks for autonomous ships.

下载PDF全文

下载文献需遵守相关版权规定

论文标题