论文标题
归纳文档网络嵌入具有主题字的关注
Inductive Document Network Embedding with Topic-Word Attention
论文作者
论文摘要
文档网络嵌入旨在学习结构化文本语料库的学习表示形式,即何时彼此链接。最近的算法通过将与节点相关的文本内容纳入其配方中来扩展网络嵌入方法。在大多数情况下,很难解释学习的表示。此外,对网络中未观察到的新文档的概括几乎没有重视。在本文中,我们提出了一种可解释的归纳文档网络嵌入方法。我们介绍了一种新颖的机制,即主题字的关注(TWA),该机制根据单词和主题表示之间的相互作用生成文档表示形式。我们通过利用文档网络中的连接来训练这些单词和主题向量,归纳文档网络嵌入(IDNE)。定量评估表明,我们的方法在各种网络上实现了最新的性能,我们定性地表明,我们的模型会产生对单词,主题和文档的有意义和可解释的表示。
Document network embedding aims at learning representations for a structured text corpus i.e. when documents are linked to each other. Recent algorithms extend network embedding approaches by incorporating the text content associated with the nodes in their formulations. In most cases, it is hard to interpret the learned representations. Moreover, little importance is given to the generalization to new documents that are not observed within the network. In this paper, we propose an interpretable and inductive document network embedding method. We introduce a novel mechanism, the Topic-Word Attention (TWA), that generates document representations based on the interplay between word and topic representations. We train these word and topic vectors through our general model, Inductive Document Network Embedding (IDNE), by leveraging the connections in the document network. Quantitative evaluations show that our approach achieves state-of-the-art performance on various networks and we qualitatively show that our model produces meaningful and interpretable representations of the words, topics and documents.