论文标题
克莱尔:半监督社区检测算法
CLARE: A Semi-supervised Community Detection Algorithm
论文作者
论文摘要
社区检测是指发现紧密相关的子图以了解网络的任务。但是,传统的社区检测算法无法确定一种特定的社区。这限制了其在现实网络中的适用性,例如将欺诈组与事务网络中的普通欺诈组区分开来。最近,半监督社区发现是一种解决方案。它的目的是在网络中寻求其他类似的社区,而标记社区很少作为培训数据。现有作品可以视为基于种子的:定位种子节点,然后在种子周围发展社区。但是,这些方法对选定种子的质量非常敏感,因为围绕未检测的种子产生的社区可能无关。此外,它们存在个人问题,例如僵化性和高度计算开销。为了解决这些问题,我们提出了克莱尔(Clare),其中包括两个关键组成部分,社区定位者和社区改写者。我们的想法是,我们可以找到潜在的社区,然后完善它们。因此,建议通过寻求与网络中培训培训的子图快速找到潜在社区的社区定位者。为了进一步调整这些定期的社区,我们设计了社区改写者。通过深度加强学习的增强,它建议智能决策,例如添加或删除节点,以灵活地完善社区结构。与多个现实世界中的先前最新方法相比,广泛的实验验证了我们工作的有效性和效率。
Community detection refers to the task of discovering closely related subgraphs to understand the networks. However, traditional community detection algorithms fail to pinpoint a particular kind of community. This limits its applicability in real-world networks, e.g., distinguishing fraud groups from normal ones in transaction networks. Recently, semi-supervised community detection emerges as a solution. It aims to seek other similar communities in the network with few labeled communities as training data. Existing works can be regarded as seed-based: locate seed nodes and then develop communities around seeds. However, these methods are quite sensitive to the quality of selected seeds since communities generated around a mis-detected seed may be irrelevant. Besides, they have individual issues, e.g., inflexibility and high computational overhead. To address these issues, we propose CLARE, which consists of two key components, Community Locator and Community Rewriter. Our idea is that we can locate potential communities and then refine them. Therefore, the community locator is proposed for quickly locating potential communities by seeking subgraphs that are similar to training ones in the network. To further adjust these located communities, we devise the community rewriter. Enhanced by deep reinforcement learning, it suggests intelligent decisions, such as adding or dropping nodes, to refine community structures flexibly. Extensive experiments verify both the effectiveness and efficiency of our work compared with prior state-of-the-art approaches on multiple real-world datasets.