Danzero：通过强化学习掌握Guandan游戏

论文标题

Danzero：通过强化学习掌握Guandan游戏

DanZero: Mastering GuanDan Game with Reinforcement Learning

论文作者

Lu, Yudong, Zhao, Jian, Zhao, Youpeng, Zhou, Wengang, Li, Houqiang

论文摘要

卡游戏AI一直是人工智能研究中的热门话题。近年来，已解决了Mahjong，Doudizhu和Texas Hold'em等复杂的纸牌游戏，相应的AI计划已经达到了人类专家的水平。在本文中，我们致力于为更复杂的纸牌游戏Guandan开发一个AI程序，该程序的规则与Doudizhu相似，但更为复杂。具体来说，大型州和行动空间的特征，一集的长度长度以及Guandan中不确定的参与者对AI计划的开发构成了巨大的挑战。为了解决这些问题，我们建议使用强化学习技术为Guandan提出第一个AI计划Danzero。具体来说，我们利用分布式框架来训练我们的AI系统。在演员过程中，我们仔细设计了状态特征和代理通过自我播放生成样本。在学习者过程中，模型通过深蒙特卡洛方法进行更新。使用160个CPU和1 GPU训练30天后，我们获得了Danzero bot。我们将其与基于启发式规则的8个基线AI程序进行了比较，结果表明了Danzero的出色表现。我们还与人类玩家一起测试Danzero，并展示其人类水平的表现。

Card game AI has always been a hot topic in the research of artificial intelligence. In recent years, complex card games such as Mahjong, DouDizhu and Texas Hold'em have been solved and the corresponding AI programs have reached the level of human experts. In this paper, we are devoted to developing an AI program for a more complex card game, GuanDan, whose rules are similar to DouDizhu but much more complicated. To be specific, the characteristics of large state and action space, long length of one episode and the unsure number of players in the GuanDan pose great challenges for the development of the AI program. To address these issues, we propose the first AI program DanZero for GuanDan using reinforcement learning technique. Specifically, we utilize a distributed framework to train our AI system. In the actor processes, we carefully design the state features and agents generate samples by self-play. In the learner process, the model is updated by Deep Monte-Carlo Method. After training for 30 days using 160 CPUs and 1 GPU, we get our DanZero bot. We compare it with 8 baseline AI programs which are based on heuristic rules and the results reveal the outstanding performance of DanZero. We also test DanZero with human players and demonstrate its human-level performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题