解释自适应系统的在线增强学习决策

论文标题

解释自适应系统的在线增强学习决策

Explaining Online Reinforcement Learning Decisions of Self-Adaptive Systems

论文作者

Feit, Felix, Metzger, Andreas, Pohl, Klaus

论文摘要

设计时间不确定性在开发自适应系统时构成了重要的挑战。例如，定义系统在面对新环境状态时应如何适应，需要理解适应的精确效果，这在设计时可能不知道。在线增强学习，即在运行时使用强化学习（RL），是在设计时间不确定性存在下实现自适应系统的一种新兴方法。通过使用在线RL，自适应系统可以从实际的操作数据中学习，并利用反馈只能在运行时可用。最近，Deep RL引起了人们的兴趣。 Deep RL代表学习的知识作为一个神经网络，它可以概括为看不见的投入，并处理连续的环境状态和适应性动作。深度RL的一个基本问题是，学习的知识没有明确表示。对于人类来说，几乎不可能将神经网络的参数化与具体的RL决策联系起来，因此深层RL本质上是一个黑匣子。然而，了解深度RL做出的决定是（1）增加信任的关键，以及（2）促进调试。这种调试与自适应系统尤其重要，因为量化对RL算法的反馈的奖励函数必须由开发人员定义。奖励函数必须由开发人员明确定义，从而引入了人为错误的潜力。为了解释自适应系统的深度RL，我们从机器学习文献中增强并结合了两种现有的可解释的RL技术。 XRL-dine的组合技术克服了各个技术的局限性。我们提出了XRL-dine的概念验证实现，以及将XRL-丁字施用到自适应系统典范中的定性和定量结果。

Design time uncertainty poses an important challenge when developing a self-adaptive system. As an example, defining how the system should adapt when facing a new environment state, requires understanding the precise effect of an adaptation, which may not be known at design time. Online reinforcement learning, i.e., employing reinforcement learning (RL) at runtime, is an emerging approach to realizing self-adaptive systems in the presence of design time uncertainty. By using Online RL, the self-adaptive system can learn from actual operational data and leverage feedback only available at runtime. Recently, Deep RL is gaining interest. Deep RL represents learned knowledge as a neural network whereby it can generalize over unseen inputs, as well as handle continuous environment states and adaptation actions. A fundamental problem of Deep RL is that learned knowledge is not explicitly represented. For a human, it is practically impossible to relate the parametrization of the neural network to concrete RL decisions and thus Deep RL essentially appears as a black box. Yet, understanding the decisions made by Deep RL is key to (1) increasing trust, and (2) facilitating debugging. Such debugging is especially relevant for self-adaptive systems, because the reward function, which quantifies the feedback to the RL algorithm, must be defined by developers. The reward function must be explicitly defined by developers, thus introducing a potential for human error. To explain Deep RL for self-adaptive systems, we enhance and combine two existing explainable RL techniques from the machine learning literature. The combined technique, XRL-DINE, overcomes the respective limitations of the individual techniques. We present a proof-of-concept implementation of XRL-DINE, as well as qualitative and quantitative results of applying XRL-DINE to a self-adaptive system exemplar.

下载PDF全文

下载文献需遵守相关版权规定

论文标题