论文标题

pdns-net:一个大型的异质图基准数据集,用于图形学习的网络分辨率

PDNS-Net: A Large Heterogeneous Graph Benchmark Dataset of Network Resolutions for Graph Learning

论文作者

Kumarasinghe, Udesh, Deniz, Fatih, Nabeel, Mohamed

论文摘要

为了推动图形学习算法中的最新状态,有必要构建大型现实世界数据集。尽管有许多用于均匀图的基准数据集,但其中只有少数可用于异质图。此外,后一个图的尺寸很小,使它们不足以了解图形学习算法在分类指标和计算资源利用方面的执行方式。我们介绍了最大的公共异质图数据集PDNS-NET,其中包含447K节点和897K边缘,用于恶意域分类任务。与流行的异质数据集IMDB和DBLP相比,PDNS-NET分别大于38倍和17倍。我们提供了PDNS-NET的详细分析,包括数据收集方法,异质图构造,描述性统计和初步图分类性能。该数据集可在https://github.com/qcri/pdns-net上公开获取。我们对PDNS-NET上流行的同质和异质图神经网络的初步评估表明,需要进一步的研究以改善这些模型在大型异质图上的性能。

In order to advance the state of the art in graph learning algorithms, it is necessary to construct large real-world datasets. While there are many benchmark datasets for homogeneous graphs, only a few of them are available for heterogeneous graphs. Furthermore, the latter graphs are small in size rendering them insufficient to understand how graph learning algorithms perform in terms of classification metrics and computational resource utilization. We introduce, PDNS-Net, the largest public heterogeneous graph dataset containing 447K nodes and 897K edges for the malicious domain classification task. Compared to the popular heterogeneous datasets IMDB and DBLP, PDNS-Net is 38 and 17 times bigger respectively. We provide a detailed analysis of PDNS-Net including the data collection methodology, heterogeneous graph construction, descriptive statistics and preliminary graph classification performance. The dataset is publicly available at https://github.com/qcri/PDNS-Net. Our preliminary evaluation of both popular homogeneous and heterogeneous graph neural networks on PDNS-Net reveals that further research is required to improve the performance of these models on large heterogeneous graphs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源