状态空间关闭：通过强化学习重新访问无尽的在线水平生成

论文标题

状态空间关闭：通过强化学习重新访问无尽的在线水平生成

State Space Closure: Revisiting Endless Online Level Generation via Reinforcement Learning

论文作者

Wang, Ziqi, Shu, Tianye, Liu, Jialin

论文摘要

在本文中，我们通过强化学习（EDRL）框架以最近提出的经验驱动的程序内容生成来重新访问无尽的在线水平生成。灵感来自于观察到EDRL倾向于产生经常性模式的启发，我们制定了状态空间封闭的概念，这使得在有限的Horizon中可以找到任何随机状态可能出现在无限 - 摩尼斯在线生成过程中的任何随机状态。通过理论分析，我们发现，即使状态空间关闭也引起了人们对多样性的关注，它也将EDRL概括为经过有限的培训的EDRL到Infinite-Horizon场景而不会降低内容质量。此外，我们验证了通过广泛使用的超级马里奥兄弟基准通过经验研究产生的EDRL产生的内容的质量和多样性。实验结果表明，EDRL产生的水平的多样性由于状态闭合而受到限制，而它们的质量并不在地平线上恶化，该水平比训练中指定的水平更长。总结我们的成果和分析，未来通过强化学习无休止的在线生成的工作应解决多样性问题，同时确保国家空间封闭和质量的发生。

In this paper, we revisit endless online level generation with the recently proposed experience-driven procedural content generation via reinforcement learning (EDRL) framework. Inspired by an observation that EDRL tends to generate recurrent patterns, we formulate a notion of state space closure which makes any stochastic state appeared possibly in an infinite-horizon online generation process can be found within a finite-horizon. Through theoretical analysis, we find that even though state space closure arises a concern about diversity, it generalises EDRL trained with a finite-horizon to the infinite-horizon scenario without deterioration of content quality. Moreover, we verify the quality and the diversity of contents generated by EDRL via empirical studies, on the widely used Super Mario Bros. benchmark. Experimental results reveal that the diversity of levels generated by EDRL is limited due to the state space closure, whereas their quality does not deteriorate in a horizon which is longer than the one specified in the training. Concluding our outcomes and analysis, future work on endless online level generation via reinforcement learning should address the issue of diversity while assuring the occurrence of state space closure and quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题