论文标题

单调的基数估计相似性选择:一种深度学习方法

Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach

论文作者

Wang, Yaoshu, Xiao, Chuan, Qin, Jianbin, Cao, Xin, Sun, Yifang, Wang, Wei, Onizuka, Makoto

论文摘要

由于捕获基本数据分布的出色能力,最近已将深度学习技术用于一系列传统的数据库问题。在本文中,我们研究了利用深度学习来进行相似性选择的基数估计的可能性。准确有效地回答此问题对于许多数据管理应用程序至关重要,尤其是对于查询优化而言。此外,在某些应用中,估计的基数应该是一致且可解释的。因此单调估计W.R.T.查询阈值是首选。我们提出了一种新颖的通用方法,可以应用于任何数据类型和距离函数。我们的方法由特征提取模型和回归模型组成。该特征提取模型将原始数据和阈值转换为锤击空间,在该空间中,基于深度学习的回归模型被利用来利用基数W.R.T.的增量属性。精度和单调性的阈值。我们制定了针对我们的模型量身定制的培训策略以及快速估计的技术。我们还讨论了如何处理更新。我们通过实验证明了方法的准确性和效率,并展示了它如何提高查询优化器的性能。

Due to the outstanding capability of capturing underlying data distributions, deep learning techniques have been recently utilized for a series of traditional database problems. In this paper, we investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. Answering this problem accurately and efficiently is essential to many data management applications, especially for query optimization. Moreover, in some applications the estimated cardinality is supposed to be consistent and interpretable. Hence a monotonic estimation w.r.t. the query threshold is preferred. We propose a novel and generic method that can be applied to any data type and distance function. Our method consists of a feature extraction model and a regression model. The feature extraction model transforms original data and threshold to a Hamming space, in which a deep learning-based regression model is utilized to exploit the incremental property of cardinality w.r.t. the threshold for both accuracy and monotonicity. We develop a training strategy tailored to our model as well as techniques for fast estimation. We also discuss how to handle updates. We demonstrate the accuracy and the efficiency of our method through experiments, and show how it improves the performance of a query optimizer.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源