论文标题
视频显着性预测的分层域调整功能学习
Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction
论文作者
论文摘要
在这项工作中,我们提出了一个用于视频显着性预测的3D完全卷积体系结构,该架构使用在不同抽象级别提取的功能生成的中间地图(称为杰出的地图)对中间地图(称为杰出的地图)进行了层次监督。我们提供了两种针对领域适应和特定领域学习的技术。对于前者,我们鼓励该模型在多个尺度上使用梯度逆转来毫不客观地学习层次结构的一般特征,以增强在培训过程中未提供注释的数据集上的概括功能。至于域专业化,我们通过专门针对各个数据集中的学习功能来最大程度地提高性能,采用特定领域的操作(即先验,平滑和批处理标准化)。我们的实验结果表明,所提出的模型在监督显着性预测方面产生了最先进的准确性。当基本层次模型通过特定于域的模块授权时,性能提高,在DHF1K基准测试中的五分之一指标中的三个指标中的三个指标都优于最先进的模型,并在其他两个方面达到了第二好的结果。相反,当我们通过启用层次梯度逆转层来在无监督的域适应设置中对其进行测试时,我们获得的性能与受监督的最先进的效果相当。
In this work, we propose a 3D fully convolutional architecture for video saliency prediction that employs hierarchical supervision on intermediate maps (referred to as conspicuity maps) generated using features extracted at different abstraction levels. We provide the base hierarchical learning mechanism with two techniques for domain adaptation and domain-specific learning. For the former, we encourage the model to unsupervisedly learn hierarchical general features using gradient reversal at multiple scales, to enhance generalization capabilities on datasets for which no annotations are provided during training. As for domain specialization, we employ domain-specific operations (namely, priors, smoothing and batch normalization) by specializing the learned features on individual datasets in order to maximize performance. The results of our experiments show that the proposed model yields state-of-the-art accuracy on supervised saliency prediction. When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two. When, instead, we test it in an unsupervised domain adaptation setting, by enabling hierarchical gradient reversal layers, we obtain performance comparable to supervised state-of-the-art.