Tencent AVS：一个整体广告视频数据集用于多模式场景分割

论文标题

Tencent AVS：一个整体广告视频数据集用于多模式场景分割

Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

论文作者

Jiang, Jie, Li, Zhimin, Xiong, Jiangfeng, Quan, Rongwei, Lu, Qinglin, Liu, Wei

论文摘要

近年来，公共基准大大提出了时间视频细分和分类。但是，此类研究仍然主要关注人类的行动，未能以整体观点描述视频。此外，以前的研究倾向于非常关注视觉信息，但忽略了视频的多模式性质。为了填补此空白，我们构建了ADS域中的腾讯“ ADS视频分割”〜（TAVS）数据集，以将多模式视频分析升级到新级别。 TAV从三个独立角度将视频描述为“演示形式”，“位置”和“样式”，并包含丰富的多模式信息，例如视频，音频和文本。 TAV在语义方面的层次结构组织，用于全面的时间视频细分，具有三个级别的多标签分类类别，例如``place'' - ``working plote'' - office'。因此，由于其多模式信息，类别的整体视图和层次粒度，TAV与以前的时间分割数据集有所区别。它包括12,000个视频，82个班级，33,900段，121,100张照片和168,500个标签。伴随着TAV，我们还提供了强大的多模式视频分割基线，并与多标签类别的预测相结合。进行了广泛的实验，以评估我们所提出的方法以及现有的代表性方法，以揭示我们数据集TAV的关键挑战。

Temporal video segmentation and classification have been advanced greatly by public benchmarks in recent years. However, such research still mainly focuses on human actions, failing to describe videos in a holistic view. In addition, previous research tends to pay much attention to visual information yet ignores the multi-modal nature of videos. To fill this gap, we construct the Tencent `Ads Video Segmentation'~(TAVS) dataset in the ads domain to escalate multi-modal video analysis to a new level. TAVS describes videos from three independent perspectives as `presentation form', `place', and `style', and contains rich multi-modal information such as video, audio, and text. TAVS is organized hierarchically in semantic aspects for comprehensive temporal video segmentation with three levels of categories for multi-label classification, e.g., `place' - `working place' - `office'. Therefore, TAVS is distinguished from previous temporal segmentation datasets due to its multi-modal information, holistic view of categories, and hierarchical granularities. It includes 12,000 videos, 82 classes, 33,900 segments, 121,100 shots, and 168,500 labels. Accompanied with TAVS, we also present a strong multi-modal video segmentation baseline coupled with multi-label class prediction. Extensive experiments are conducted to evaluate our proposed method as well as existing representative methods to reveal key challenges of our dataset TAVS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题