基于STL的反馈控制器的合成

论文标题

基于STL的反馈控制器的合成

STL-Based Synthesis of Feedback Controllers Using Reinforcement Learning

论文作者

Singh, Nikhil Kumar, Saha, Indranil

论文摘要

深钢筋学习（DRL）有可能用于合成具有未知动力学的各种复杂系统的反馈控制器（代理）。预计这些系统将满足使用时间逻辑最佳捕获的各种安全性和livese属性。在RL中，奖励函数在指定这些药物的期望行为方面起着至关重要的作用。但是，为RL代理设计奖励功能以满足复杂的时间逻辑规格的问题受到了文献的关注有限。为了解决这个问题，我们通过使用信号时间逻辑（STL）的定量语义（一种广泛使用的时间逻辑来指定网络物理系统的行为，提供了一种实时产生奖励的系统方式。我们提出了一种新的定量语义，用于具有多种理想特性的STL，使其适合于奖励生成。我们在几个复杂的连续控制基准上评估了基于STL的增强学习机制，并将我们的STL语义与文献中的STL语义进行了比较，从而在合成控制器代理方面的功效方面将其与文献相比。实验结果确定了我们的新语义，最适合通过增强学习来合成复杂的连续动力系统的反馈控制器。

Deep Reinforcement Learning (DRL) has the potential to be used for synthesizing feedback controllers (agents) for various complex systems with unknown dynamics. These systems are expected to satisfy diverse safety and liveness properties best captured using temporal logic. In RL, the reward function plays a crucial role in specifying the desired behaviour of these agents. However, the problem of designing the reward function for an RL agent to satisfy complex temporal logic specifications has received limited attention in the literature. To address this, we provide a systematic way of generating rewards in real-time by using the quantitative semantics of Signal Temporal Logic (STL), a widely used temporal logic to specify the behaviour of cyber-physical systems. We propose a new quantitative semantics for STL having several desirable properties, making it suitable for reward generation. We evaluate our STL-based reinforcement learning mechanism on several complex continuous control benchmarks and compare our STL semantics with those available in the literature in terms of their efficacy in synthesizing the controller agent. Experimental results establish our new semantics to be the most suitable for synthesizing feedback controllers for complex continuous dynamical systems through reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题