论文标题
超图主题:概念,算法和发现
Hypergraph Motifs: Concepts, Algorithms, and Discoveries
论文作者
论文摘要
HyperGraphs自然代表了群体互动,这些群体在许多领域中无所不在:研究人员的协作,项目的共购买,蛋白质的联合互动,仅举几例。在这项工作中,我们提出了以系统的方式回答以下问题的工具:(Q1)现实世界中超图的结构设计原理是什么? (Q2)我们如何比较不同大小的超图的局部结构? (Q3)我们如何识别哪些HyperGraphs来自哪个域?我们首先定义了HyperGraph基序(H-MOTIFS),该基序描述了三个连接的Hyperedges的连通性模式。然后,我们将每个H-motif在超图中的重要性定义为相对于正确随机的超图中的h-motif的发生。最后,我们将特征曲线(CP)定义为每个H-MOTIF的归一化意义的向量。关于Q1,我们发现来自5个域中11个现实世界中的H-MoTIFS的出现与随机超图的11个域显然有所区别。此外,我们证明CP捕获了每个域独有的局部结构模式,因此比较了超图的CPS解决了Q2和Q3。我们的算法贡献是提出Mochy,这是一种用于计算H-Motifs出现超图中H-Motifs的平行算法系列。我们从理论上分析了它们的速度和准确性,并从经验上表明,高级近似版本Mochy-A+分别比基本近似版本和精确版本更准确和32倍。
Hypergraphs naturally represent group interactions, which are omnipresent in many domains: collaborations of researchers, co-purchases of items, joint interactions of proteins, to name a few. In this work, we propose tools for answering the following questions in a systematic manner: (Q1) what are structural design principles of real-world hypergraphs? (Q2) how can we compare local structures of hypergraphs of different sizes? (Q3) how can we identify domains which hypergraphs are from? We first define hypergraph motifs (h-motifs), which describe the connectivity patterns of three connected hyperedges. Then, we define the significance of each h-motif in a hypergraph as its occurrences relative to those in properly randomized hypergraphs. Lastly, we define the characteristic profile (CP) as the vector of the normalized significance of every h-motif. Regarding Q1, we find that h-motifs' occurrences in 11 real-world hypergraphs from 5 domains are clearly distinguished from those of randomized hypergraphs. In addition, we demonstrate that CPs capture local structural patterns unique to each domain, and thus comparing CPs of hypergraphs addresses Q2 and Q3. Our algorithmic contribution is to propose MoCHy, a family of parallel algorithms for counting h-motifs' occurrences in a hypergraph. We theoretically analyze their speed and accuracy, and we show empirically that the advanced approximate version MoCHy-A+ is up to 25X more accurate and 32X faster than the basic approximate and exact versions, respectively.