通过Fenchel-Rockafellar二元学习的强化学习

论文标题

通过Fenchel-Rockafellar二元学习的强化学习

Reinforcement Learning via Fenchel-Rockafellar Duality

论文作者

Nachum, Ofir, Dai, Bo

论文摘要

我们回顾了凸双重性的基本概念，重点是非常有用的Fenchel-Rockafellar二重性。我们总结了如何将这种双重性应用于各种强化学习（RL）设置，包括政策评估或优化，在线或离线学习，以及折扣或未获得的奖励。这些派生产生了许多有趣的结果，包括具有行为不可能的离线数据和通过Max-likelihood Optimization学习策略的方法的政策评估和政策策略梯度的能力。尽管这些结果中的许多以前都以各种形式出现，但我们为这些结果提供了统一的处理和观点，我们希望研究人员能够更好地利用并应用凸双重性工具，以在RL方面取得进一步的进展。

We review basic concepts of convex duality, focusing on the very general and supremely useful Fenchel-Rockafellar duality. We summarize how this duality may be applied to a variety of reinforcement learning (RL) settings, including policy evaluation or optimization, online or offline learning, and discounted or undiscounted rewards. The derivations yield a number of intriguing results, including the ability to perform policy evaluation and on-policy policy gradient with behavior-agnostic offline data and methods to learn a policy via max-likelihood optimization. Although many of these results have appeared previously in various forms, we provide a unified treatment and perspective on these results, which we hope will enable researchers to better use and apply the tools of convex duality to make further progress in RL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题