论文标题

reler@zju-alibaba提交给EGO4D自然语言查询挑战2022

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022

论文作者

Liu, Naiyuan, Wang, Xiaohan, Li, Xiaobo, Yang, Yi, Zhuang, Yueting

论文摘要

在本报告中,我们将CVPR 2022中的EGO4D自然语言查询(NLQ)挑战提交给Reler@Zju-Alibaba提交。给定视频剪辑和文本查询,该挑战的目的是找到视频剪辑的时间段落,在哪里可以获得QUERY答案的地方。为了解决这项任务,我们提出了一个多尺度的跨模式变压器和视频框架级对比度损失,以完全发现语言查询与视频剪辑之间的相关性。此外,我们提出了两种数据增强策略,以增加培训样本的多样性。实验结果证明了我们方法的有效性。最后的提交在排行榜上排名第一。

In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2022. Given a video clip and a text query, the goal of this challenge is to locate a temporal moment of the video clip where the answer to the query can be obtained. To tackle this task, we propose a multi-scale cross-modal transformer and a video frame-level contrastive loss to fully uncover the correlation between language queries and video clips. Besides, we propose two data augmentation strategies to increase the diversity of training samples. The experimental results demonstrate the effectiveness of our method. The final submission ranked first on the leaderboard.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源