论文标题
分子表示的统一2D和3D预训练
Unified 2D and 3D Pre-Training of Molecular Representations
论文作者
论文摘要
分子表示学习最近引起了很多关注。一个分子可以看作是由边缘/键连接的节点/原子的2D图,也可以通过具有所有原子的3维坐标的3D构象来表示。我们注意到,以前的大多数工作都分别处理2D和3D信息,而共同利用这两个资源可能会促进更有用的代表。在这项工作中,我们探讨了这个有吸引力的想法,并提出了一种基于统一的2D和3D预培训的新表示学习方法。原子坐标和原子间距离被编码,然后通过图神经网络与原子表示融合。该模型已在三个任务上进行了预训练:掩盖原子和坐标的重建,在2D图上进行的3D构象产生以及以3D构型为条件的2D图生成。我们在11个下游分子属性预测任务上评估我们的方法:7仅2D信息,其中4个具有2D和3D信息。我们的方法在10个任务上实现了最新的结果,而仅2D任务的平均改进为8.3%。我们的方法还可以在两个3D构象生成任务上取得重大改进。
Molecular representation learning has attracted much attention recently. A molecule can be viewed as a 2D graph with nodes/atoms connected by edges/bonds, and can also be represented by a 3D conformation with 3-dimensional coordinates of all atoms. We note that most previous work handles 2D and 3D information separately, while jointly leveraging these two sources may foster a more informative representation. In this work, we explore this appealing idea and propose a new representation learning method based on a unified 2D and 3D pre-training. Atom coordinates and interatomic distances are encoded and then fused with atomic representations through graph neural networks. The model is pre-trained on three tasks: reconstruction of masked atoms and coordinates, 3D conformation generation conditioned on 2D graph, and 2D graph generation conditioned on 3D conformation. We evaluate our method on 11 downstream molecular property prediction tasks: 7 with 2D information only and 4 with both 2D and 3D information. Our method achieves state-of-the-art results on 10 tasks, and the average improvement on 2D-only tasks is 8.3%. Our method also achieves significant improvement on two 3D conformation generation tasks.