通过多模式变分编码器框架预测微观效果的普及

论文标题

通过多模式变分编码器框架预测微观效果的普及

Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework

论文作者

Zhu, Yaochen, Xie, Jiayi, Chen, Zhenzhong

论文摘要

作为一种新兴的用户生成的内容，微观视频大大丰富了人们的娱乐体验和社交互动。但是，在研究人员中，单个微观视频的流行模式仍然难以捉摸。主要挑战之一是，微观视频的潜在普及在各种外部因素的影响下倾向于波动，这使其充满了不确定性。此外，由于微观视频主要由缺乏专业技术的个人上传，因此可能存在多种类型的噪声，这些噪声可能会掩盖有用的信息。在本文中，我们提出了一个多模式变分编码器码编码器（MMMMVED）框架，用于微观视频通俗性预测任务。 mmved了解了一个随机的高斯嵌入微观视频，该嵌入具有其知名度，同时保留了固有的不确定性。此外，通过优化深度变分信息瓶颈下限（IBLBO），所学的隐藏表示形式对受欢迎程度目标具有最大表现力，而对微观视频特征的噪声最大程度地压缩。此外，将贝叶斯的特殊产品原理应用于多模式编码器，在该编码器中，通过所有可用的方式，全面做出信息保存或丢弃的决定。我们从Xigua收集的公共数据集和数据集进行的广泛实验证明了拟议的mmved框架的有效性。

As an emerging type of user-generated content, micro-video drastically enriches people's entertainment experiences and social interactions. However, the popularity pattern of an individual micro-video still remains elusive among the researchers. One of the major challenges is that the potential popularity of a micro-video tends to fluctuate under the impact of various external factors, which makes it full of uncertainties. In addition, since micro-videos are mainly uploaded by individuals that lack professional techniques, multiple types of noise could exist that obscure useful information. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework for micro-video popularity prediction tasks. MMVED learns a stochastic Gaussian embedding of a micro-video that is informative to its popularity level while preserves the inherent uncertainties simultaneously. Moreover, through the optimization of a deep variational information bottleneck lower-bound (IBLBO), the learned hidden representation is shown to be maximally expressive about the popularity target while maximally compressive to the noise in micro-video features. Furthermore, the Bayesian product-of-experts principle is applied to the multimodal encoder, where the decision for information keeping or discarding is made comprehensively with all available modalities. Extensive experiments conducted on a public dataset and a dataset we collect from Xigua demonstrate the effectiveness of the proposed MMVED framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题