利用多模式视频分类的时间连贯性

论文标题

利用多模式视频分类的时间连贯性

Exploiting Temporal Coherence for Multi-modal Video Categorization

论文作者

Goyal, Palash, Sahu, Saurabh, Ghosh, Shalini, Lee, Chul

论文摘要

多模式ML模型可以以多种方式（例如，视频，图像，音频，文本）处理数据，对于各种问题（例如，对象检测，场景理解）的视频内容分析非常有用。在本文中，我们通过使用多模式方法专注于视频分类问题。我们已经开发了一种新型的基于时间连贯的正则化方法，该方法适用于不同类型的模型（例如RNN，Netvlad，Transformer）。我们通过实验证明了我们提出的具有时间连贯性的多模式视频分类模型如何表现强劲的最先进的基线模型。

Multimodal ML models can process data in multiple modalities (e.g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding). In this paper, we focus on the problem of video categorization by using a multimodal approach. We have developed a novel temporal coherence-based regularization approach, which applies to different types of models (e.g., RNN, NetVLAD, Transformer). We demonstrate through experiments how our proposed multimodal video categorization models with temporal coherence out-perform strong state-of-the-art baseline models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题