论文标题
通过对抗性学习和自我训练的多个摄像头无监督的域适应管道,用于文化网站的对象检测
A Multi Camera Unsupervised Domain Adaptation Pipeline for Object Detection in Cultural Sites through Adversarial Learning and Self-Training
论文作者
论文摘要
对象检测算法允许启用许多有趣的应用程序,这些应用程序可以在智能手机和可穿戴设备等不同设备中实现。在文化网站的背景下,在可穿戴设备(例如一对智能眼镜)中实现这些算法,使能够使用增强现实(AR)显示有关艺术品的额外信息,并在旅行中丰富访客的体验。但是,对象检测算法需要在许多带注释的示例上进行培训,以取得合理的结果。这带来了一个主要限制,因为注释过程需要人类的监督,这使得在时间和成本方面昂贵。降低这些成本的可能解决方案包括利用工具以自动从该站点的3D模型中生成合成标记的图像。但是,经过合成数据训练的模型不会概括在应该使用的目标情况下获得的真实图像。此外,对象探测器应该能够与不同的可穿戴设备或不同的移动设备一起使用,从而使概括变得更加困难。在本文中,我们提出了一个在文化场所收集的新数据集,以研究在存在多个未标记的目标域的域适应性问题,以检测与不同的摄像机相对应的多个未标记的目标域,并考虑了考虑训练目的的合成图像,获得的标记为源域。我们提出了一种新的域适应方法,该方法的表现优于当前最新方法,结合了在功能和像素级别与自我训练过程中对齐域的好处。我们在以下链接中发布数据集https://iplab.dmi.unict.it/obj-mda/以及https://github.com/fppv-iplab/stmda-retinanetnet上提议的体系结构的代码。
Object detection algorithms allow to enable many interesting applications which can be implemented in different devices, such as smartphones and wearable devices. In the context of a cultural site, implementing these algorithms in a wearable device, such as a pair of smart glasses, allow to enable the use of augmented reality (AR) to show extra information about the artworks and enrich the visitors' experience during their tour. However, object detection algorithms require to be trained on many well annotated examples to achieve reasonable results. This brings a major limitation since the annotation process requires human supervision which makes it expensive in terms of time and costs. A possible solution to reduce these costs consist in exploiting tools to automatically generate synthetic labeled images from a 3D model of the site. However, models trained with synthetic data do not generalize on real images acquired in the target scenario in which they are supposed to be used. Furthermore, object detectors should be able to work with different wearable devices or different mobile devices, which makes generalization even harder. In this paper, we present a new dataset collected in a cultural site to study the problem of domain adaptation for object detection in the presence of multiple unlabeled target domains corresponding to different cameras and a labeled source domain obtained considering synthetic images for training purposes. We present a new domain adaptation method which outperforms current state-of-the-art approaches combining the benefits of aligning the domains at the feature and pixel level with a self-training process. We release the dataset at the following link https://iplab.dmi.unict.it/OBJ-MDA/ and the code of the proposed architecture at https://github.com/fpv-iplab/STMDA-RetinaNet.