论文标题
通过Fenchel-Rockafellar二元学习的强化学习
Reinforcement Learning via Fenchel-Rockafellar Duality
论文作者
论文摘要
我们回顾了凸双重性的基本概念,重点是非常有用的Fenchel-Rockafellar二重性。我们总结了如何将这种双重性应用于各种强化学习(RL)设置,包括政策评估或优化,在线或离线学习,以及折扣或未获得的奖励。这些派生产生了许多有趣的结果,包括具有行为不可能的离线数据和通过Max-likelihood Optimization学习策略的方法的政策评估和政策策略梯度的能力。尽管这些结果中的许多以前都以各种形式出现,但我们为这些结果提供了统一的处理和观点,我们希望研究人员能够更好地利用并应用凸双重性工具,以在RL方面取得进一步的进展。
We review basic concepts of convex duality, focusing on the very general and supremely useful Fenchel-Rockafellar duality. We summarize how this duality may be applied to a variety of reinforcement learning (RL) settings, including policy evaluation or optimization, online or offline learning, and discounted or undiscounted rewards. The derivations yield a number of intriguing results, including the ability to perform policy evaluation and on-policy policy gradient with behavior-agnostic offline data and methods to learn a policy via max-likelihood optimization. Although many of these results have appeared previously in various forms, we provide a unified treatment and perspective on these results, which we hope will enable researchers to better use and apply the tools of convex duality to make further progress in RL.