论文标题

社交和上下文意识到人类的运动,并提出预测

Socially and Contextually Aware Human Motion and Pose Forecasting

论文作者

Adeli, Vida, Adeli, Ehsan, Reid, Ian, Niebles, Juan Carlos, Rezatofighi, Hamid

论文摘要

在与人类互动的同时,平稳而无缝的机器人导航取决于预测人类的运动。预测这种人类动态通常涉及建模人类轨迹(全球运动)或详细的身体关节运动(局部运动)。先前的工作通常分别解决地方和全球人类运动。在本文中,我们提出了一个新颖的框架,以应对统一的端到端管道中的人类运动(或轨迹)和身体骨架姿势的预测。为了解决这个现实世界中的问题,我们将场景和社会环境同时纳入了这项预测任务的关键线索,并将其纳入我们建议的框架中。为此,我们首先通过i)使用共享的门控复发单元(GRU)编码编码它们的历史记录,而II)将指标应用为损失,该指标将每个任务的错误源与单个距离共同衡量。然后,我们通过编码视频数据的时空表示来结合场景上下文。我们还通过使用社交池层产生了动作和所有个人的姿势来包括社交线索。最后,我们使用基于GRU的解码器来预测运动和骨骼姿势。我们证明,与两个社交数据集上的几个基线相比,我们提出的框架取得了出色的性能。

Smooth and seamless robot navigation while interacting with humans depends on predicting human movements. Forecasting such human dynamics often involves modeling human trajectories (global motion) or detailed body joint movements (local motion). Prior work typically tackled local and global human movements separately. In this paper, we propose a novel framework to tackle both tasks of human motion (or trajectory) and body skeleton pose forecasting in a unified end-to-end pipeline. To deal with this real-world problem, we consider incorporating both scene and social contexts, as critical clues for this prediction task, into our proposed framework. To this end, we first couple these two tasks by i) encoding their history using a shared Gated Recurrent Unit (GRU) encoder and ii) applying a metric as loss, which measures the source of errors in each task jointly as a single distance. Then, we incorporate the scene context by encoding a spatio-temporal representation of the video data. We also include social clues by generating a joint feature representation from motion and pose of all individuals from the scene using a social pooling layer. Finally, we use a GRU based decoder to forecast both motion and skeleton pose. We demonstrate that our proposed framework achieves a superior performance compared to several baselines on two social datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源