论文标题
具有持久同源性和化学单词嵌入的机器学习提高了金属有机框架的预测准确性和可解释性
Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks
论文作者
论文摘要
机器学习已成为材料发现中的强大方法。它的主要挑战是选择创建材料的可解释表示的功能,这些功能在多个预测任务中有用。我们介绍了一种端到端的机器学习模型,该模型会自动生成描述符,以捕获材料结构和化学的复杂表示。这种方法基于计算拓扑技术(即持续的同源性)和自然语言处理中的单词嵌入。它会自动直接从材料系统中封装几何和化学信息。我们通过在不同条件上预测甲烷和二氧化碳的吸附来证明我们在多个纳米多孔金属有机框架数据集上的方法。与通过常用,手动策划的特征构成的模型相比,我们的结果表明,目标的准确性和可传递性都有很大的提高,始终达到平均达到25-30%的Root-squared-eviation,R2得分平均增加40-50%。我们方法的关键优势是可解释性:我们的模型确定了与在不同压力下吸附最有效的毛孔,这有助于理解原子级结构 - 材料设计的特色关系。
Machine learning has emerged as a powerful approach in materials discovery. Its major challenge is selecting features that create interpretable representations of materials, useful across multiple prediction tasks. We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a material's structure and chemistry. This approach builds on computational topology techniques (namely, persistent homology) and word embeddings from natural language processing. It automatically encapsulates geometric and chemical information directly from the material system. We demonstrate our approach on multiple nanoporous metal-organic framework datasets by predicting methane and carbon dioxide adsorption across different conditions. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from the commonly-used, manually-curated features, consistently achieving an average 25-30% decrease in root-mean-squared-deviation and an average increase of 40-50% in R2 scores. A key advantage of our approach is interpretability: Our model identifies the pores that correlate best to adsorption at different pressures, which contributes to understanding atomic-level structure--property relationships for materials design.