论文标题
统计系统的分层自我回收神经网络
Hierarchical autoregressive neural networks for statistical systems
论文作者
论文摘要
最近有人提出,神经网络可用于近似出现的多维概率分布。在晶格场理论或统计力学中。随后,它们可以用作变异近似器,用于统计系统的广泛特性,例如自由能,也可以用作蒙特卡洛模拟中使用的神经抽样器。不幸的是,这种方法的实际应用受到其不利缩放训练所需的数值成本以及系统尺寸的内存要求。这是由于以下事实:原始命题涉及一个宽度的神经网络,该网络随着自由度的总数,例如$ l^2 $在二维$ l \ times l $ lattice的情况下。在这项工作中,我们提出了物理自由度的分层结合,例如旋转,以神经元为缩放尺度替换为系统的线性范围$ l $。我们通过模拟各种尺寸的晶格,最高$ 128 \ times 128 $旋转,展示了我们在二维ISING模型上的方法,其时间基准达到了$ 512 \ times 512 $的晶格。我们观察到我们的建议可以提高神经网络训练的质量,即近似概率分布更接近以前可以实现的目标。结果,变分的自由能达到一个更接近其理论期望的值,如果应用于马尔可夫链蒙特卡洛算法中,则产生的自相关时间较小。最后,通过较小网络的层次结构替换单个神经网络大大降低了内存要求。
It was recently proposed that neural networks could be used to approximate many-dimensional probability distributions that appear e.g. in lattice field theories or statistical mechanics. Subsequently they can be used as variational approximators to asses extensive properties of statistical systems, like free energy, and also as neural samplers used in Monte Carlo simulations. The practical application of this approach is unfortunately limited by its unfavorable scaling both of the numerical cost required for training, and the memory requirements with the system size. This is due to the fact that the original proposition involved a neural network of width which scaled with the total number of degrees of freedom, e.g. $L^2$ in case of a two dimensional $L\times L$ lattice. In this work we propose a hierarchical association of physical degrees of freedom, for instance spins, to neurons which replaces it with the scaling with the linear extent $L$ of the system. We demonstrate our approach on the two-dimensional Ising model by simulating lattices of various sizes up to $128 \times 128$ spins, with time benchmarks reaching lattices of size $512 \times 512$. We observe that our proposal improves the quality of neural network training, i.e. the approximated probability distribution is closer to the target that could be previously achieved. As a consequence, the variational free energy reaches a value closer to its theoretical expectation and, if applied in a Markov Chain Monte Carlo algorithm, the resulting autocorrelation time is smaller. Finally, the replacement of a single neural network by a hierarchy of smaller networks considerably reduces the memory requirements.