论文标题
使用参数化随机步道扩散内核的有向图聚类
Clustering for directed graphs using parametrized random walk diffusion kernels
论文作者
论文摘要
基于随机步行操作员的聚类已被证明对无向图有效,但其对有向图(Digraphs)的概括更具挑战性。尽管随机步行操作员针对Digraphs有很好的定义,但是在大多数情况下,此类图并未强烈连接,因此相关的随机步行不可约束,这是在无方向环境中自然存在的聚类的关键属性。为了解决这个问题,通常的解决方法是天真地对称邻接矩阵,或者通过传送的随机步行操作员替换自然的随机步行操作员,但这可能导致通过边缘方向携带的有价值信息的丢失。在本文中,我们引入了一个新的聚类框架,即参数化的随机步行扩散内核聚类(P-RWDKC),适用于处理有向和无方向的图。我们的框架基于扩散几何形状和广义光谱聚类框架。因此,我们提出了一种算法,该算法通过考虑与参数化的内核操作员相关的随机行走动力学以及估计其关键扩散时间来自动揭示群集结构。从现实世界数据集和现实世界图构建的$ K $ -NN图上的实验表明,我们的聚类方法在所有测试的情况下都表现良好,并且在其中大多数情况下都表现出现有方法。
Clustering based on the random walk operator has been proven effective for undirected graphs, but its generalization to directed graphs (digraphs) is much more challenging. Although the random walk operator is well-defined for digraphs, in most cases such graphs are not strongly connected, and hence the associated random walks are not irreducible, which is a crucial property for clustering that exists naturally in the undirected setting. To remedy this, the usual workaround is to either naively symmetrize the adjacency matrix or to replace the natural random walk operator by the teleporting random walk operator, but this can lead to the loss of valuable information carried by edge directionality. In this paper, we introduce a new clustering framework, the Parametrized Random Walk Diffusion Kernel Clustering (P-RWDKC), which is suitable for handling both directed and undirected graphs. Our framework is based on the diffusion geometry and the generalized spectral clustering framework. Accordingly, we propose an algorithm that automatically reveals the cluster structure at a given scale, by considering the random walk dynamics associated with a parametrized kernel operator, and by estimating its critical diffusion time. Experiments on $K$-NN graphs constructed from real-world datasets and real-world graphs show that our clustering approach performs well in all tested cases, and outperforms existing approaches in most of them.