论文标题

作者2VEC:生成用户嵌入的框架

Author2Vec: A Framework for Generating User Embedding

论文作者

Wu, Xiaodong, Lin, Weizhe, Wang, Zhilin, Rastorgueva, Elena

论文摘要

在线论坛和社交媒体平台每天提供嘈杂但有价值的数据。在本文中,我们提出了一种新颖的端到端神经网络的用户嵌入系统作者2VEC。该模型结合了由BERT(Transformers的双向编码器表示)生成的句子表示形式,并具有一个新颖的无监督预训练的预培训目标,作者身份分类,以产生编码有用的用户内部属性的更好的用户嵌入。该用户嵌入系统已在10K REDDIT用户的POST数据中进行了预训练,并在两个用户分类基准:抑郁症检测和人格分类中进行了分析和评估,在该基准中,该模型被证明胜过基于传统的基于计数和基于预测的方法。我们证实了作者2VEC成功编码有用的用户属性,而生成的用户嵌入在下游分类任务中的表现很好,而无需进一步的填充。

Online forums and social media platforms provide noisy but valuable data every day. In this paper, we propose a novel end-to-end neural network-based user embedding system, Author2Vec. The model incorporates sentence representations generated by BERT (Bidirectional Encoder Representations from Transformers) with a novel unsupervised pre-training objective, authorship classification, to produce better user embedding that encodes useful user-intrinsic properties. This user embedding system was pre-trained on post data of 10k Reddit users and was analyzed and evaluated on two user classification benchmarks: depression detection and personality classification, in which the model proved to outperform traditional count-based and prediction-based methods. We substantiate that Author2Vec successfully encoded useful user attributes and the generated user embedding performs well in downstream classification tasks without further finetuning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源