论文标题
在上下文中进行比较:改善余弦相似性度量与度量张量
Comparing in context: Improving cosine similarity measures with a metric tensor
论文作者
论文摘要
余弦相似性是针对接受语言建模目标训练的预训练单词嵌入的相关性的广泛使用的度量。诸如Wordsim-353和Simlex-999之类的数据集对人类注释者的相似单词的速率率,因此通常用于评估语言模型的性能。因此,对单词相似性任务的任何改进都需要改进的单词表示。在本文中,我们建议使用扩展的余弦相似性度量来提高该任务的绩效,并提高可解释性。我们探讨了这样的假设,即如果单词相似度对共享相同的上下文,则该方法特别有用,可以学习明显的上下文化相似性衡量标准。我们首先使用Richie等人的数据集。 (2020)学习上下文化的指标并将结果与使用标准余弦相似性度量获得的基线值进行比较,这始终显示出改进。我们还针对SIMLEX-999和WordsIM-353训练了上下文化的相似性度量,将结果与相应的基准进行了比较,并将这些数据集用作在上下文化数据集中学到的全文相似性度量的独立测试集,从而获得了许多测试的积极结果。
Cosine similarity is a widely used measure of the relatedness of pre-trained word embeddings, trained on a language modeling goal. Datasets such as WordSim-353 and SimLex-999 rate how similar words are according to human annotators, and as such are often used to evaluate the performance of language models. Thus, any improvement on the word similarity task requires an improved word representation. In this paper, we propose instead the use of an extended cosine similarity measure to improve performance on that task, with gains in interpretability. We explore the hypothesis that this approach is particularly useful if the word-similarity pairs share the same context, for which distinct contextualized similarity measures can be learned. We first use the dataset of Richie et al. (2020) to learn contextualized metrics and compare the results with the baseline values obtained using the standard cosine similarity measure, which consistently shows improvement. We also train a contextualized similarity measure for both SimLex-999 and WordSim-353, comparing the results with the corresponding baselines, and using these datasets as independent test sets for the all-context similarity measure learned on the contextualized dataset, obtaining positive results for a number of tests.