论文标题
超级图案的经验政策评估
Empirical Policy Evaluation with Supergraphs
论文作者
论文摘要
我们设计和分析了强化学习中的经验政策评估问题的算法。我们的算法探索了从高成本州向后探索的算法,以找到高价值的算法,这与从所有州前进的前进方法相比。尽管几篇论文以经验上证明了向后探索的实用性,但我们进行了严格的分析,这些分析表明我们的算法可以将平均样本复杂性从$ O(S \ log s)$降低到低至$ o(\ log s)$的低点。
We devise and analyze algorithms for the empirical policy evaluation problem in reinforcement learning. Our algorithms explore backward from high-cost states to find high-value ones, in contrast to forward approaches that work forward from all states. While several papers have demonstrated the utility of backward exploration empirically, we conduct rigorous analyses which show that our algorithms can reduce average-case sample complexity from $O(S \log S)$ to as low as $O(\log S)$.