论文标题
截断分裂对比:从标签视频中学习的框架
Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos
论文作者
论文摘要
用嘈杂的标签(LNL)学习是一个经典的问题,已广泛研究用于图像任务,但文献中的视频却少得多。从图像到视频的直接迁移,而没有考虑视频的属性,例如计算成本和冗余信息,这不是一个明智的选择。在本文中,我们提出了两种带有嘈杂标签的视频分析的新策略:1)一种轻巧的通道选择方法,称为基于特征标签噪声检测的通道截断。此方法选择了每个类别中最具歧视性的通道来拆分清洁和嘈杂的实例。 2)一种被称为噪声对比学习的新型对比策略,该策略构建了清洁与嘈杂实例之间的关系,以正规化模型训练。进行视频分类的三个知名基准数据集的实验表明,我们提出的tru {\ bf n} cat {\ bf e} -split-contr {\ bf a} s {\ bf t}(neat)(Neat)明显胜过现有基准。通过将尺寸降低到10 \%,我们的方法可在严重噪声(Symmetric-80 \%)下实现超过0.4噪声检测F1得分和5 \%分类精度的提高。多亏了噪声对比学习,小型运动和STH-STH-V1的平均分类精度提高了1.6 \%。
Learning with noisy label (LNL) is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering the properties of videos, such as computational cost and redundant information, is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) A lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category; 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed tru{\bf N}cat{\bf E}-split-contr{\bf A}s{\bf T} (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10\% of it, our method achieves over 0.4 noise detection F1-score and 5\% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80\%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6\%.