论文标题
部分可观测时空混沌系统的无模型预测
(Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network Based On Transformer for 3D Human Pose Estimation
论文作者
论文摘要
对于当前的3D人体姿势估计任务,一组方法主要从空间和时间相关中学习2d-3d投影的规则。但是,早期方法对整个身体关节在时域中的全局特征进行了建模,但忽略了单个关节的运动轨迹。最近的工作[29]认为,不同关节和分别处理每个关节的时间关系之间的运动存在差异。但是,我们发现在某些特定动作下,不同的关节显示出相同的运动趋势。因此,我们提出的FusionFormer方法介绍了基于时空模块的自我反射模块和一个相互的反射模块。在以平行方式通过线性网络通过线性网络融合了全局时空时空特征和局部关节轨迹。为了消除不良2D姿势对3D预测的影响,最后我们还引入了一个姿势改进网络,以平衡3D预测的一致性。此外,我们在两个基准数据集(Human36M,MPI-INF-3DHP)上评估了所提出的方法。将我们的方法与基线方法PoseFormer进行比较,结果表明,在Human36M数据集上,MPJPE的2.4%和4.3%的P-MPJPE。
For the current 3D human pose estimation task, a group of methods mainly learn the rules of 2D-3D projection from spatial and temporal correlation. However, earlier methods model the global features of the entire body joint in the time domain, but ignore the motion trajectory of individual joint. The recent work [29] considers that there are differences in motion between different joints and deals with the temporal relationship of each joint separately. However, we found that different joints show the same movement trends under some specific actions. Therefore, our proposed Fusionformer method introduces a self-trajectory module and a mutual-trajectory module based on the spatio-temporal module .After that, the global spatio-temporal features and local joint trajectory features are fused through a linear network in a parallel manner. To eliminate the influence of bad 2D poses on 3D projections, finally we also introduce a pose refinement network to balance the consistency of 3D projections. In addition, we evaluate the proposed method on two benchmark datasets (Human3.6M, MPI-INF-3DHP). Comparing our method with the baseline method poseformer, the results show an improvement of 2.4% MPJPE and 4.3% P-MPJPE on the Human3.6M dataset, respectively.