超级图案的经验政策评估

论文标题

超级图案的经验政策评估

Empirical Policy Evaluation with Supergraphs

论文作者

Vial, Daniel, Subramanian, Vijay

论文摘要

我们设计和分析了强化学习中的经验政策评估问题的算法。我们的算法探索了从高成本州向后探索的算法，以找到高价值的算法，这与从所有州前进的前进方法相比。尽管几篇论文以经验上证明了向后探索的实用性，但我们进行了严格的分析，这些分析表明我们的算法可以将平均样本复杂性从$ O（S \ log s）$降低到低至$ o（\ log s）$的低点。

We devise and analyze algorithms for the empirical policy evaluation problem in reinforcement learning. Our algorithms explore backward from high-cost states to find high-value ones, in contrast to forward approaches that work forward from all states. While several papers have demonstrated the utility of backward exploration empirically, we conduct rigorous analyses which show that our algorithms can reduce average-case sample complexity from $O(S \log S)$ to as low as $O(\log S)$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题