火灾：边缘计算迁移的失败自适应加固学习框架

论文标题

火灾：边缘计算迁移的失败自适应加固学习框架

FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations

论文作者

Siew, Marie, Sharma, Shikhar, Li, Zekai, Guo, Kun, Xu, Chao, Lorido-Botran, Tania, Quek, Tony Q. S., Joe-Wong, Carlee

论文摘要

在边缘计算中，由于用户移动性，用户的服务配置文件迁移。已经提出了加强学习（RL）框架来进行模拟数据训练。但是，现有的RL框架忽略了偶尔的服务器故障，尽管罕见，但影响潜伏期敏感的应用程序，例如自主驾驶和实时障碍物检测。然而，这些失败（罕见事件）在历史培训数据中没有充分代表，对数据驱动的RL算法构成了挑战。由于在现实世界应用应用程序中调整故障频率是不切实际的，因此我们引入火灾，该框架通过在边缘计算数字双胞胎环境中训练RL策略来适应罕见事件。我们提出了IMRE，这是一种基于采样的重要性Q学习算法，该算法与罕见事件对其对价值函数的影响成比例地采样。火灾考虑了个人和共享服务概况之间的延迟，迁移，失败和备份放置成本。我们证明了Imre的界限和融合到最佳性。接下来，我们介绍了算法的新颖Q学习（IMDQL）和演员评论家（iMacre）版本，以增强可扩展性。我们扩展框架以容纳具有不同风险公差的用户。通过痕量驱动的实验，我们表明，在发生故障时，与香草RL和贪婪的基线相比，大火降低了成本。

In edge computing, users' service profiles are migrated due to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so, often trained on simulated data. However, existing RL frameworks overlook occasional server failures, which although rare, impact latency-sensitive applications like autonomous driving and real-time obstacle detection. Nevertheless, these failures (rare events), being not adequately represented in historical training data, pose a challenge for data-driven RL algorithms. As it is impractical to adjust failure frequency in real-world applications for training, we introduce FIRE, a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. FIRE considers delay, migration, failure, and backup placement costs across individual and shared service profiles. We prove ImRE's boundedness and convergence to optimality. Next, we introduce novel deep Q-learning (ImDQL) and actor critic (ImACRE) versions of our algorithm to enhance scalability. We extend our framework to accommodate users with varying risk tolerances. Through trace driven experiments, we show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题