论文标题
用于数据集成的暹罗图神经网络
Siamese Graph Neural Networks for Data Integration
论文作者
论文摘要
数十年来广泛研究了数据集成,并从不同的角度接近。但是,该领域仍然主要由规则驱动,并且缺乏普遍的自动化。机器学习,尤其是深度学习的最新发展为对数据集成问题的更一般,更有效的解决方案开辟了道路。在这项工作中,我们提出了一种从结构化数据(例如关系数据库)以及非结构化资源(例如新闻文章的免费文本)建模和集成实体的一般方法。我们的方法旨在显式建模和利用实体之间的关系,从而使用所有可用信息并尽可能保留上下文。这是通过组合暹罗和图神经网络来传播连接实体之间的信息并支持高可扩展性来实现的。我们评估了整合有关业务实体数据的任务的方法,并证明它优于基于标准的系统系统,以及其他不使用基于图表的表示的深度学习方法。
Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent development in machine learning and in particular deep learning has opened the way to more general and more efficient solutions to data integration problems. In this work, we propose a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources, such as free text from news articles. Our approach is designed to explicitly model and leverage relations between entities, thereby using all available information and preserving as much context as possible. This is achieved by combining siamese and graph neural networks to propagate information between connected entities and support high scalability. We evaluate our method on the task of integrating data about business entities, and we demonstrate that it outperforms standard rule-based systems, as well as other deep learning approaches that do not use graph-based representations.