论文标题

通过计数米草图进行计数近似的相变,并具有保守的更新

Phase transition in count approximation by Count-Min sketch with conservative updates

论文作者

Fusy, Éric, Kucherov, Gregory

论文摘要

Count-Min Sketch是一种基于哈希的数据结构,可代表动态变化的计数器的关联阵列。在这里,假设输入密钥的统一分布,我们在更强的更新规则下分析了计数米的计数版本。我们表明,保守更新策略的准确性会经历相变,这取决于输入中不同键的数量,这是计数阵数阵列大小的一小部分。我们证明,在阈值以下,相对误差渐近$ o(1)$(与常规计数策略相反),而高于阈值,相对误差为$θ(1)$。阈值对应于随机$ k $均匀超图的剥离阈值。我们证明,即使对于少量键,基础超图的剥离性是确保$ O(1)$错误的关键属性。最后,我们提供了一个实验证据,表明相变不扩展到非均匀分布,尤其是流行ZIPF的分布。

Count-Min sketch is a hash-based data structure to represent a dynamically changing associative array of counters. Here we analyse the counting version of Count-Min under a stronger update rule known as \textit{conservative update}, assuming the uniform distribution of input keys. We show that the accuracy of conservative update strategy undergoes a phase transition, depending on the number of distinct keys in the input as a fraction of the size of the Count-Min array. We prove that below the threshold, the relative error is asymptotically $o(1)$ (as opposed to the regular Count-Min strategy), whereas above the threshold, the relative error is $Θ(1)$. The threshold corresponds to the peelability threshold of random $k$-uniform hypergraphs. We demonstrate that even for small number of keys, peelability of the underlying hypergraph is a crucial property to ensure the $o(1)$ error. Finally, we provide an experimental evidence that the phase transition does not extend to non-uniform distributions, in particular to the popular Zipf's distribution.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源