论文标题
ViewBirderers:学习从单一以自我的视图中恢复地面平面人群轨迹和自我运动
ViewBirdiformer: Learning to recover ground-plane crowd trajectories and ego-motion from a single ego-centric view
论文作者
论文摘要
我们介绍了一种基于学习的新方法来查看鸟类化的方法,即从观察到的以自我为中心的视频中恢复人群行人及其观察者的地面平面轨迹的任务。查看鸟类化对于在密集的人群中的移动机器人导航和本地化至关重要,在这些人群中,静态背景很难看到并且可靠地跟踪。这是一个挑战,主要是有两个原因。 i)行人的绝对轨迹与观察者的运动纠缠在一起,观察者的运动需要与以自我为中心的视频中观察到的相对运动脱钩,ii)ii)描述行人运动相互作用的人群运动模型特定于场景,但未知是先验的。为此,我们引入了一个基于变压器的网络,称为ViewBirdSiformer,该网络通过自我注意力隐含地模拟了人群运动,并将相对的2D运动观测分解到人群的地面平面轨迹和摄像机之间,并通过视图之间的交叉注意。最重要的是,ViewBirdSiformer在一次前传球中实现了观看鸟类化,这为准确的实时,始终在境内的意识打开了大门。广泛的实验结果表明,ViewBirdiformer的精度与最新的精度相似或更好,而执行时间降低了三个数量级。
We introduce a novel learning-based method for view birdification, the task of recovering ground-plane trajectories of pedestrians of a crowd and their observer in the same crowd just from the observed ego-centric video. View birdification becomes essential for mobile robot navigation and localization in dense crowds where the static background is hard to see and reliably track. It is challenging mainly for two reasons; i) absolute trajectories of pedestrians are entangled with the movement of the observer which needs to be decoupled from their observed relative movements in the ego-centric video, and ii) a crowd motion model describing the pedestrian movement interactions is specific to the scene yet unknown a priori. For this, we introduce a Transformer-based network referred to as ViewBirdiformer which implicitly models the crowd motion through self-attention and decomposes relative 2D movement observations onto the ground-plane trajectories of the crowd and the camera through cross-attention between views. Most important, ViewBirdiformer achieves view birdification in a single forward pass which opens the door to accurate real-time, always-on situational awareness. Extensive experimental results demonstrate that ViewBirdiformer achieves accuracy similar to or better than state-of-the-art with three orders of magnitude reduction in execution time.