论文标题

Tencent AVS:一个整体广告视频数据集用于多模式场景分割

Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

论文作者

Jiang, Jie, Li, Zhimin, Xiong, Jiangfeng, Quan, Rongwei, Lu, Qinglin, Liu, Wei

论文摘要

近年来,公共基准大大提出了时间视频细分和分类。但是,此类研究仍然主要关注人类的行动,未能以整体观点描述视频。此外,以前的研究倾向于非常关注视觉信息,但忽略了视频的多模式性质。为了填补此空白,我们构建了ADS域中的腾讯“ ADS视频分割”〜(TAVS)数据集,以将多模式视频分析升级到新级别。 TAV从三个独立角度将视频描述为“演示形式”,“位置”和“样式”,并包含丰富的多模式信息,例如视频,音频和文本。 TAV在语义方面的层次结构组织,用于全面的时间视频细分,具有三个级别的多标签分类类别,例如``place'' - ``working plote'' - office'。因此,由于其多模式信息,类别的整体视图和层次粒度,TAV与以前的时间分割数据集有所区别。它包括12,000个视频,82个班级,33,900段,121,100张照片和168,500个标签。伴随着TAV,我们还提供了强大的多模式视频分割基线,并与多标签类别的预测相结合。进行了广泛的实验,以评估我们所提出的方法以及现有的代表性方法,以揭示我们数据集TAV的关键挑战。

Temporal video segmentation and classification have been advanced greatly by public benchmarks in recent years. However, such research still mainly focuses on human actions, failing to describe videos in a holistic view. In addition, previous research tends to pay much attention to visual information yet ignores the multi-modal nature of videos. To fill this gap, we construct the Tencent `Ads Video Segmentation'~(TAVS) dataset in the ads domain to escalate multi-modal video analysis to a new level. TAVS describes videos from three independent perspectives as `presentation form', `place', and `style', and contains rich multi-modal information such as video, audio, and text. TAVS is organized hierarchically in semantic aspects for comprehensive temporal video segmentation with three levels of categories for multi-label classification, e.g., `place' - `working place' - `office'. Therefore, TAVS is distinguished from previous temporal segmentation datasets due to its multi-modal information, holistic view of categories, and hierarchical granularities. It includes 12,000 videos, 82 classes, 33,900 segments, 121,100 shots, and 168,500 labels. Accompanied with TAVS, we also present a strong multi-modal video segmentation baseline coupled with multi-label class prediction. Extensive experiments are conducted to evaluate our proposed method as well as existing representative methods to reveal key challenges of our dataset TAVS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源