论文标题
分子产生的基于得分的生成模型
Score-Based Generative Models for Molecule Generation
论文作者
论文摘要
生成模型的最新进展使探索设计空间更容易为从头生成。但是,诸如gan和正常化的流行生成模型分别面临诸如对抗性训练和建筑限制的训练不稳定性之类的挑战。基于得分的生成模型使用得分函数近似对数概率密度的梯度进行建模,而不是直接建模密度函数,并使用退火的langevin Dynamics对其进行对数概率密度的梯度进行建模。我们认为,基于得分的生成模型由于其架构灵活性,例如用SE(3)eproivariant模型代替得分功能,因此可以为分子生成的新机会打开新的机会。在这项工作中,我们通过测试基于得分模型的分子生成的功效来奠定基础。我们在锌数据集的150万个样本的自我引用嵌入式字符串(自拍照)表示上训练基于变压器的得分函数,并使用Moses Benchmarking框架来评估一套指标上生成的样品。
Recent advances in generative models have made exploring design spaces easier for de novo molecule generation. However, popular generative models like GANs and normalizing flows face challenges such as training instabilities due to adversarial training and architectural constraints, respectively. Score-based generative models sidestep these challenges by modelling the gradient of the log probability density using a score function approximation, as opposed to modelling the density function directly, and sampling from it using annealed Langevin Dynamics. We believe that score-based generative models could open up new opportunities in molecule generation due to their architectural flexibility, such as replacing the score function with an SE(3) equivariant model. In this work, we lay the foundations by testing the efficacy of score-based models for molecule generation. We train a Transformer-based score function on Self-Referencing Embedded Strings (SELFIES) representations of 1.5 million samples from the ZINC dataset and use the Moses benchmarking framework to evaluate the generated samples on a suite of metrics.