可通航的接近图形驱动的本机混合查询具有结构化和非结构化约束

论文标题

可通航的接近图形驱动的本机混合查询具有结构化和非结构化约束

Navigable Proximity Graph-Driven Native Hybrid Queries with Structured and Unstructured Constraints

论文作者

Wang, Mengzhao, Lv, Lingwei, Xu, Xiaoliang, Wang, Yuxiang, Yue, Qiang, Ni, Jiongkang

论文摘要

随着研究兴趣的激增，媒介相似性搜索被应用于多个领域，包括数据挖掘，计算机视觉和信息检索。 {给定一组对象（例如，一组图像）和一个查询对象，我们可以轻松地将每个对象转换为特征向量，并应用矢量相似性搜索以检索最相似的对象。但是，原始的向量相似性搜索不能很好地支持\ textit {hybrid查询}，其中用户不仅输入非结构化的查询约束（即查询对象的特征向量），还要输入结构化的查询约束（即所需的利益属性属性）。混合查询处理旨在识别具有相似特征向量的这些对象以查询对象并满足给定的属性约束。 Recent efforts have attempted to answer a hybrid query by performing attribute filtering and vector similarity search separately and then merging the results later, which limits efficiency and accuracy because they are not purpose-built for hybrid queries.} In this paper, we propose a native hybrid query (NHQ) framework based on proximity graph (PG), which provides the specialized \textit{composite index and joint pruning}混合查询的模块。我们在此框架上很容易部署现有的各种PG，以有效地处理混合查询。此外，我们提出了两个具有优化边缘选择和路由策略的新型可通道PGS（NPG），它们比现有PG的总体性能更好。之后，我们在NHQ中部署了拟议的NPG，以形成两种混合查询方法，这些方法在所有实验数据集上的最先进竞争对手（在相同的\ textit {ersect}下迅速越快），包括八个公共公共和一个内部的现实数据集。我们的代码和数据集已于\ url {https://github.com/ashenon3/nhq}发布。

As research interest surges, vector similarity search is applied in multiple fields, including data mining, computer vision, and information retrieval. {Given a set of objects (e.g., a set of images) and a query object, we can easily transform each object into a feature vector and apply the vector similarity search to retrieve the most similar objects. However, the original vector similarity search cannot well support \textit{hybrid queries}, where users not only input unstructured query constraint (i.e., the feature vector of query object) but also structured query constraint (i.e., the desired attributes of interest). Hybrid query processing aims at identifying these objects with similar feature vectors to query object and satisfying the given attribute constraints. Recent efforts have attempted to answer a hybrid query by performing attribute filtering and vector similarity search separately and then merging the results later, which limits efficiency and accuracy because they are not purpose-built for hybrid queries.} In this paper, we propose a native hybrid query (NHQ) framework based on proximity graph (PG), which provides the specialized \textit{composite index and joint pruning} modules for hybrid queries. We easily deploy existing various PGs on this framework to process hybrid queries efficiently. Moreover, we present two novel navigable PGs (NPGs) with optimized edge selection and routing strategies, which obtain better overall performance than existing PGs. After that, we deploy the proposed NPGs in NHQ to form two hybrid query methods, which significantly outperform the state-of-the-art competitors on all experimental datasets (10$\times$ faster under the same \textit{Recall}), including eight public and one in-house real-world datasets. Our code and datasets have been released at \url{https://github.com/AshenOn3/NHQ}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题