在基于增强学习的线性编程的简单方法的最佳枢轴路径上

论文标题

在基于增强学习的线性编程的简单方法的最佳枢轴路径上

On the optimal pivot path of simplex method for linear programming based on reinforcement learning

论文作者

Li, Anqi, Guo, Tiande, Han, Congying, Li, Bonan, Li, Haoran

论文摘要

基于现有的枢轴规则，在最坏情况下，线性编程的单纯形方法不是多项式。因此，单纯形方法的最佳枢轴至关重要。这项研究提出了最佳规则，以根据蒙特卡洛树搜索（MCT）找到单纯形编程问题的单纯形方法的所有最短路径。具体而言，我们首先提出单纯式旋转，以将单纯形方法传输到树搜索模式中，同时避免重复的基础变量。其次，我们提出了四个具有两个动作和两个奖励的强化学习（RL）模型，以使Monte Carlo Tree搜索适用于单纯形方法。第三，我们设定了一个新的动作选择标准，以改善初始探索中的不准确评估。事实证明，当可行区域中的顶点数为$ C_N^m $时，我们的方法可以生成所有最短的枢轴路径，即变量数量的多项式。此外，我们通过实验验证提出的时间表可以避免不必要的搜索并提供最佳的枢轴路径。此外，此方法可以为解决线性编程问题的各种监督学习方法提供最佳的枢轴标签。

Based on the existing pivot rules, the simplex method for linear programming is not polynomial in the worst case. Therefore the optimal pivot of the simplex method is crucial. This study proposes the optimal rule to find all shortest pivot paths of the simplex method for linear programming problems based on Monte Carlo tree search (MCTS). Specifically, we first propose the SimplexPseudoTree to transfer the simplex method into tree search mode while avoiding repeated basis variables. Secondly, we propose four reinforcement learning (RL) models with two actions and two rewards to make the Monte Carlo tree search suitable for the simplex method. Thirdly, we set a new action selection criterion to ameliorate the inaccurate evaluation in the initial exploration. It is proved that when the number of vertices in the feasible region is $C_n^m$, our method can generate all the shortest pivot paths, which is the polynomial of the number of variables. In addition, we experimentally validate that the proposed schedule can avoid unnecessary search and provide the optimal pivot path. Furthermore, this method can provide the best pivot labels for all kinds of supervised learning methods to solve linear programming problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题