论文标题
利用多模式视频分类的时间连贯性
Exploiting Temporal Coherence for Multi-modal Video Categorization
论文作者
论文摘要
多模式ML模型可以以多种方式(例如,视频,图像,音频,文本)处理数据,对于各种问题(例如,对象检测,场景理解)的视频内容分析非常有用。在本文中,我们通过使用多模式方法专注于视频分类问题。我们已经开发了一种新型的基于时间连贯的正则化方法,该方法适用于不同类型的模型(例如RNN,Netvlad,Transformer)。我们通过实验证明了我们提出的具有时间连贯性的多模式视频分类模型如何表现强劲的最先进的基线模型。
Multimodal ML models can process data in multiple modalities (e.g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding). In this paper, we focus on the problem of video categorization by using a multimodal approach. We have developed a novel temporal coherence-based regularization approach, which applies to different types of models (e.g., RNN, NetVLAD, Transformer). We demonstrate through experiments how our proposed multimodal video categorization models with temporal coherence out-perform strong state-of-the-art baseline models.