即使是从最弱者那里才能学到什么？学习编程策略的草图

论文标题

即使是从最弱者那里才能学到什么？学习编程策略的草图

What can we Learn Even From the Weakest? Learning Sketches for Programmatic Strategies

论文作者

Medeiros, Leandro C., Aleixo, David S., Lelis, Levi H. S.

论文摘要

在本文中，我们表明行为克隆可用于学习程序化策略的有效草图。我们表明，即使克隆弱者的行为，也可以帮助综合程序化策略的草图。这是因为即使是弱者也可以提供有用的信息，例如，玩家必须在游戏转弯时选择动作。如果不使用行为克隆，则合成器需要通过玩游戏来学习最基本的信息，这在计算上可能很昂贵。我们从经验上证明了通过模拟退火和UCT合成器的素描学习方法的优势。我们在无法停止和微效果的游戏中评估合成器。基于草图的合成器能够比原始同行学习更强大的程序化策略。我们的合成器生成的策略不能阻止这种击败游戏的传统程序策略。他们还综合了策略，这些策略击败了最新微型竞赛中最佳性能的方法。

In this paper we show that behavioral cloning can be used to learn effective sketches of programmatic strategies. We show that even the sketches learned by cloning the behavior of weak players can help the synthesis of programmatic strategies. This is because even weak players can provide helpful information, e.g., that a player must choose an action in their turn of the game. If behavioral cloning is not employed, the synthesizer needs to learn even the most basic information by playing the game, which can be computationally expensive. We demonstrate empirically the advantages of our sketch-learning approach with simulated annealing and UCT synthesizers. We evaluate our synthesizers in the games of Can't Stop and MicroRTS. The sketch-based synthesizers are able to learn stronger programmatic strategies than their original counterparts. Our synthesizers generate strategies of Can't Stop that defeat a traditional programmatic strategy for the game. They also synthesize strategies that defeat the best performing method from the latest MicroRTS competition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题