半监督的异质图学习以及多层数据增强

论文标题

半监督的异质图学习以及多层数据增强

Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation

论文作者

Chen, Ying, Qiang, Siwei, Ha, Mingming, Liu, Xiaolei, Li, Shaoshuai, Yuan, Lingfeng, Guo, Xiaobo, Zhu, Zhenfeng

论文摘要

近年来，使用数据增强（DA）的半监督图学习是目前是最常用且表现最佳的方法，可在很少有标签样品的稀疏场景中增强模型鲁棒性。与均质图不同，异质图中的DA面临更大的挑战：信息的异质性需要DA策略以有效处理异质关系，这考虑了不同类型的邻居和边缘对目标节点的信息贡献。此外，信息的过度积是由不均匀分布和复杂图中强聚类形成的负曲率引起的。为了应对这些挑战，本文提出了一种新的方法，名为半监督异构图学习，具有多级数据增强（HG-MDA）。对于DA中信息的异质性问题，提出了用于异质图的特征的节点和拓扑增强策略。基于元关联的注意力被作为选择增强节点和边缘的索引之一。对于信息过度的问题，基于三角形的边缘添加和去除旨在减轻负曲率并带来拓扑的增益。最后，损失函数由标记数据的跨透镜损失和未标记数据的一致性正则化组成。为了有效融合各种DA策略的预测结果，使用了锐化。公共数据集（即ACM，DBLP，OGB和行业数据集MB）上的现有实验表明，HG-MDA胜过当前的SOTA模型。此外，HG-MDA适用于互联网融资方案中的用户识别，帮助企业增加30％的主要用户，并将贷款和余额增加3.6％，11.1％和9.8％。

In recent years, semi-supervised graph learning with data augmentation (DA) is currently the most commonly used and best-performing method to enhance model robustness in sparse scenarios with few labeled samples. Differing from homogeneous graph, DA in heterogeneous graph has greater challenges: heterogeneity of information requires DA strategies to effectively handle heterogeneous relations, which considers the information contribution of different types of neighbors and edges to the target nodes. Furthermore, over-squashing of information is caused by the negative curvature that formed by the non-uniformity distribution and strong clustering in complex graph. To address these challenges, this paper presents a novel method named Semi-Supervised Heterogeneous Graph Learning with Multi-level Data Augmentation (HG-MDA). For the problem of heterogeneity of information in DA, node and topology augmentation strategies are proposed for the characteristics of heterogeneous graph. And meta-relation-based attention is applied as one of the indexes for selecting augmented nodes and edges. For the problem of over-squashing of information, triangle based edge adding and removing are designed to alleviate the negative curvature and bring the gain of topology. Finally, the loss function consists of the cross-entropy loss for labeled data and the consistency regularization for unlabeled data. In order to effectively fuse the prediction results of various DA strategies, the sharpening is used. Existing experiments on public datasets, i.e., ACM, DBLP, OGB, and industry dataset MB show that HG-MDA outperforms current SOTA models. Additionly, HG-MDA is applied to user identification in internet finance scenarios, helping the business to add 30% key users, and increase loans and balances by 3.6%, 11.1%, and 9.8%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题