论文标题
通过捕获局部语义对应关系,使用字典定义词汇半主预测
Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence
论文作者
论文摘要
被证明在许多NLP任务中被证明是语言学中人类语言的最低语义单元的SEMEMES。由于手动构建和更新半知识库(KBS)的成本很高,因此已经提出了自动半Ememe预测的任务来协助半eme注释。在本文中,我们探讨了将词典定义应用于预测未注释单词的半决赛的方法。我们发现,每个单词的隔符通常都与其字典定义中的不同单词匹配,我们将这种匹配关系命名为本地的语义通信。因此,我们提出了一个半通信池(SCORP)模型,该模型能够捕获这种匹配以预测半决赛。我们在著名的Sememe KB Hownet上评估了我们的模型和基线方法,并发现我们的模型可以实现最先进的性能。此外,进一步的定量分析表明,我们的模型可以正确地学习字典定义中的sememes和单词之间的局部语义对应关系,从而解释了我们的模型的有效性。本文的源代码可以从https://github.com/thunlp/scorp获得。
Sememes, defined as the minimum semantic units of human languages in linguistics, have been proven useful in many NLP tasks. Since manual construction and update of sememe knowledge bases (KBs) are costly, the task of automatic sememe prediction has been proposed to assist sememe annotation. In this paper, we explore the approach of applying dictionary definitions to predicting sememes for unannotated words. We find that sememes of each word are usually semantically matched to different words in its dictionary definition, and we name this matching relationship local semantic correspondence. Accordingly, we propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes. We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance. Moreover, further quantitative analysis shows that our model can properly learn the local semantic correspondence between sememes and words in dictionary definitions, which explains the effectiveness of our model. The source codes of this paper can be obtained from https://github.com/thunlp/scorp.