论文标题
软设置操作和句子相似性的子空间表示
Subspace Representations for Soft Set Operations and Sentence Similarities
论文作者
论文摘要
在自然语言处理(NLP)领域,连续矢量表示对于捕获单个单词的语义含义至关重要。然而,当涉及一组单词的表示时,基于传统的向量的方法通常会在表现力上挣扎,并且缺乏基本的集合操作,例如联合,交叉路口和补充。受量子逻辑的启发,我们意识到在预训练的单词嵌入空间中的单词集和相应的集合操作的表示。通过在线性子空间中扎根我们的方法,我们可以有效地计算各种集合操作,并促进连续空间内会员函数的软计算。此外,我们允许直接在单词矢量中计算f-评分,从而建立了与句子相似性评估的直接链接。在使用广泛使用的预训练的嵌入和基准测试的实验中,我们表明我们的基于子空间的集合操作在句子相似性和集合检索任务中始终优于基于向量的矢量。
In the field of natural language processing (NLP), continuous vector representations are crucial for capturing the semantic meanings of individual words. Yet, when it comes to the representations of sets of words, the conventional vector-based approaches often struggle with expressiveness and lack the essential set operations such as union, intersection, and complement. Inspired by quantum logic, we realize the representation of word sets and corresponding set operations within pre-trained word embedding spaces. By grounding our approach in the linear subspaces, we enable efficient computation of various set operations and facilitate the soft computation of membership functions within continuous spaces. Moreover, we allow for the computation of the F-score directly within word vectors, thereby establishing a direct link to the assessment of sentence similarity. In experiments with widely-used pre-trained embeddings and benchmarks, we show that our subspace-based set operations consistently outperform vector-based ones in both sentence similarity and set retrieval tasks.