论文标题
前缀过滤器:实际上和理论上比盛开更好
Prefix Filter: Practically and Theoretically Better Than Bloom
论文作者
论文摘要
许多近似成员资格查询数据结构或过滤器的应用仅需要一个增量过滤器,该过滤器支持插入但不支持删除。但是,增量过滤器的设计空间缺少一个“最佳点”过滤器,该过滤器结合了空间效率,快速查询和快速插入。增量过滤器(例如Bloom和Bloom Bloom过滤器)不是空间效率的。动态过滤器(即支持删除),例如杜鹃或矢量商滤波器,是空间有效的,但不会显示出始终如一的快速插入和查询。 在本文中,我们提出了前缀滤波器,这是一个提出上述挑战的增量过滤器:(1)其空间(位)类似于最新的动态过滤器; (2)查询吞吐量很高,与杜鹃滤光片的吞吐量相当; (3)插入吞吐量很高,整体构建时间比向量商滤波器和杜鹃过滤器的插入时间更快,$ 1.39 \ times $ - $ 1.46 \ times $ $和$ 3.2 \ times $ - $ 3.5 \ times $。我们对前缀过滤器进行了严格的分析,该分析也适用于实用的集合尺寸(即$ n = 2^{25} $)。该分析涉及失败的可能性,误报率以及操作需要访问多个缓存线的可能性。
Many applications of approximate membership query data structures, or filters, require only an incremental filter that supports insertions but not deletions. However, the design space of incremental filters is missing a "sweet spot" filter that combines space efficiency, fast queries, and fast insertions. Incremental filters, such as the Bloom and blocked Bloom filter, are not space efficient. Dynamic filters (i.e., supporting deletions), such as the cuckoo or vector quotient filter, are space efficient but do not exhibit consistently fast insertions and queries. In this paper, we propose the prefix filter, an incremental filter that addresses the above challenge: (1) its space (in bits) is similar to state-of-the-art dynamic filters; (2) query throughput is high and is comparable to that of the cuckoo filter; and (3) insert throughput is high with overall build times faster than those of the vector quotient filter and cuckoo filter by $1.39\times$-$1.46\times$ and $3.2\times$-$3.5\times$, respectively. We present a rigorous analysis of the prefix filter that holds also for practical set sizes (i.e., $n=2^{25}$). The analysis deals with the probability of failure, false positive rate, and probability that an operation requires accessing more than a single cache line.