论文标题
无线网络系统的深度学习:联合估计控制方法
Deep Learning for Wireless Networked Systems: a joint Estimation-Control-Scheduling Approach
论文作者
论文摘要
通过无线通信连接传感器,控制器和执行器的无线网络控制系统(WNCS)是行业4.0 ERA中高度可扩展和低成本的控制系统部署的关键启用技术。尽管控制和通信在WNCSS中存在紧密相互作用,但大多数现有作品采用分离设计方法。这主要是因为控制通信策略的共同设计需要较大的混合状态和动作空间,这使得最佳问题在数学上是棘手的,并且难以通过经典算法有效地解决。在本文中,我们系统地研究了基于无线褪色通道的模型不符合的非线性WNC的基于模型的非线性WNC的基于基于估计的估计器控制仪的共同设计。特别是,我们提出了一个共同设计框架,以了解传感器的信息年龄(AOI)状态和动态渠道状态。我们提出了一种新颖的深钢筋学习(DRL),利用了无模型和基于模型的数据,用于控制器和调度程序优化的算法。提出了一种基于AOI的重要性抽样算法,该算法考虑了数据准确性,以提高学习效率。我们还开发了新的方案,以增强联合训练的稳定性。广泛的实验表明,拟议的联合训练算法可以有效地解决各种情况下的估计控制式共同设计问题,并与分离设计和一些基准策略相比提供了显着的性能增长。
Wireless networked control system (WNCS) connecting sensors, controllers, and actuators via wireless communications is a key enabling technology for highly scalable and low-cost deployment of control systems in the Industry 4.0 era. Despite the tight interaction of control and communications in WNCSs, most existing works adopt separative design approaches. This is mainly because the co-design of control-communication policies requires large and hybrid state and action spaces, making the optimal problem mathematically intractable and difficult to be solved effectively by classic algorithms. In this paper, we systematically investigate deep learning (DL)-based estimator-control-scheduler co-design for a model-unknown nonlinear WNCS over wireless fading channels. In particular, we propose a co-design framework with the awareness of the sensor's age-of-information (AoI) states and dynamic channel states. We propose a novel deep reinforcement learning (DRL)-based algorithm for controller and scheduler optimization utilizing both model-free and model-based data. An AoI-based importance sampling algorithm that takes into account the data accuracy is proposed for enhancing learning efficiency. We also develop novel schemes for enhancing the stability of joint training. Extensive experiments demonstrate that the proposed joint training algorithm can effectively solve the estimation-control-scheduling co-design problem in various scenarios and provide significant performance gain compared to separative design and some benchmark policies.