论文标题
游戏及以后的可剥削性最小化
Exploitability Minimization in Games and Beyond
论文作者
论文摘要
伪游戏是对正常游戏的自然且众所周知的概括,在这种游戏中,每个玩家所采取的动作不仅会影响其他玩家的回报,而且会影响其他玩家的策略。伪游戏的解决方案概念卓越是纳什均衡(GNE),即每个玩家的策略是可行的策略概况,没有玩家可以通过单方面偏离其他参与者策略确定的策略设定的另一个策略来提高其回报。长期以来,由于在环境保护到物流再到电信的各种领域的应用,伪游戏中的GNE计算一直是一个有趣的问题。尽管计算gne通常是ppad-hard,但尝试在限制类别的伪游戏类别中计算它们仍然很感兴趣。一种方法是搜索最大程度地减少可剥削性的策略配置文件,即所有参与者的遗憾之和。由于一般来说,由于可剥削性是非不同的,因此开发有效的一阶方法,使其最小化似乎不可能乍一看。但是,我们观察到,可剥削性最小化问题可以作为最小的最大优化问题重新增值,从而获得多项式时间一阶方法来计算GNE的完善,即在Convex-Conconcave累积累积累积的遗憾pseudo-Pseudo-Pseudo-games与联合侧面约束。更普遍地,我们还表明,我们的方法在Lipschitz-smooth-smooth pseudo-games中找到了具有共同凸约限制的多项式时间的可剥削性的固定点。最后,我们在实验中证明,我们的方法不仅超过了已知的算法,而且即使在伪游戏中,他们不能保证它们会融合到gne,但它们也可以这样做,但可以通过适当的初始化。
Pseudo-games are a natural and well-known generalization of normal-form games, in which the actions taken by each player affect not only the other players' payoffs, as in games, but also the other players' strategy sets. The solution concept par excellence for pseudo-games is the generalized Nash equilibrium (GNE), i.e., a strategy profile at which each player's strategy is feasible and no player can improve their payoffs by unilaterally deviating to another strategy in the strategy set determined by the other players' strategies. The computation of GNE in pseudo-games has long been a problem of interest, due to applications in a wide variety of fields, from environmental protection to logistics to telecommunications. Although computing GNE is PPAD-hard in general, it is still of interest to try to compute them in restricted classes of pseudo-games. One approach is to search for a strategy profile that minimizes exploitability, i.e., the sum of the regrets across all players. As exploitability is nondifferentiable in general, developing efficient first-order methods that minimize it might not seem possible at first glance. We observe, however, that the exploitability-minimization problem can be recast as a min-max optimization problem, and thereby obtain polynomial-time first-order methods to compute a refinement of GNE, namely the variational equilibria (VE), in convex-concave cumulative regret pseudo-games with jointly convex constraints. More generally, we also show that our methods find the stationary points of the exploitability in polynomial time in Lipschitz-smooth pseudo-games with jointly convex constraints. Finally, we demonstrate in experiments that our methods not only outperform known algorithms, but that even in pseudo-games where they are not guaranteed to converge to a GNE, they may do so nonetheless, with proper initialization.