论文标题
分配语义建模:一种应用与本体相关方法的训练术语/单词矢量空间模型的修订技术
Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach
论文作者
论文摘要
我们设计了一种新技术,用于通过基于神经网络的分布语义建模,以一种基于神经网络的方法来学习分布式术语表示(或术语嵌入) - 术语矢量空间模型,受到最新与本体相关的方法的启发(使用不同类型的上下文知识(使用句法知识,术语学知识,语义知识,语义知识等),以指称术语(术语)(术语)(预定)(术语)(术语)(预定)(术语)(术语)(预先划分) spt。我们的方法依赖于从自然语言文本中自动提取,以及随后以问题为导向或面向应用程序的(也深深注释)文本语料库的形成,其中基本实体是术语(包括非组成和构图术语)。这使我们有机会从分布式单词表示(或单词嵌入)转换为分布式术语表示(或术语嵌入)。这种过渡将允许生成不同主题域的更准确的语义图(同样,输入术语之间的关系 - 探索群集和对立或对它们的假设也很有用)。语义映射可以使用Vec2Graph -Python库表示为图形,用于可视化单词嵌入(我们情况下的术语嵌入)为动态和交互式图。 VEC2Graph库加上术语嵌入式库不仅可以提高解决标准NLP任务的准确性,而且还可以更新自动化本体开发的传统概念。我们工作的主要实际结果是开发套件(以Web服务API和Web应用程序表示的工具包),它为乌克兰自然语言文本的基本语言预处理和语义进行了所有必要的例程,用于对乌克兰的自然语言文本进行术语矢量空间模型的未来培训。
We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) - term vector space models as a result, inspired by the recent ontology-related approach (using different types of contextual knowledge such as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to the identification of terms (term extraction) and relations between them (relation extraction) called semantic pre-processing technology - SPT. Our method relies on automatic term extraction from the natural language texts and subsequent formation of the problem-oriented or application-oriented (also deeply annotated) text corpora where the fundamental entity is the term (includes non-compositional and compositional terms). This gives us an opportunity to changeover from distributed word representations (or word embeddings) to distributed term representations (or term embeddings). This transition will allow to generate more accurate semantic maps of different subject domains (also, of relations between input terms - it is useful to explore clusters and oppositions, or to test your hypotheses about them). The semantic map can be represented as a graph using Vec2graph - a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs. The Vec2graph library coupled with term embeddings will not only improve accuracy in solving standard NLP tasks, but also update the conventional concept of automated ontology development. The main practical result of our work is the development kit (set of toolkits represented as web service APIs and web application), which provides all necessary routines for the basic linguistic pre-processing and the semantic pre-processing of the natural language texts in Ukrainian for future training of term vector space models.