论文标题
多人姿势估计具有增强的特征聚合和选择
Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection
论文作者
论文摘要
我们提出了一种新型增强的特征聚合和选择网络(EFASNET),以进行多人2D人姿势估计。由于功能表示增强,我们的方法可以很好地处理拥挤,混乱和遮挡的场景。更具体地说,构建层次多尺度特征聚合并使聚合特征歧视的特征聚合和选择模块(FASM)被提议获得更准确的细粒度表示,从而导致更精确的关节位置。然后,我们执行一种简单的特征融合(FF)策略,该策略有效地融合了高分辨率的空间特征和低分辨率的语义特征,以获取更可靠的上下文信息以供良好估计的关节。最后,我们建立一个密集的UP采样卷积(DUC)模块,以生成更精确的预测,该预测可以恢复通常在常见的UP采样过程中无法获得的缺失的联合细节。结果,预测的关键点热图更准确。全面的实验表明,所提出的方法的表现优于最先进的方法,并实现了超过三个基准数据集的出色性能:最近的大数据集人群,可可关键点检测数据集和MPII人类姿势数据集。我们的代码将在接受后发布。
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation. Due to enhanced feature representation, our method can well handle crowded, cluttered and occluded scenes. More specifically, a Feature Aggregation and Selection Module (FASM), which constructs hierarchical multi-scale feature aggregation and makes the aggregated features discriminative, is proposed to get more accurate fine-grained representation, leading to more precise joint locations. Then, we perform a simple Feature Fusion (FF) strategy which effectively fuses high-resolution spatial features and low-resolution semantic features to obtain more reliable context information for well-estimated joints. Finally, we build a Dense Upsampling Convolution (DUC) module to generate more precise prediction, which can recover missing joint details that are usually unavailable in common upsampling process. As a result, the predicted keypoint heatmaps are more accurate. Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods and achieves the superior performance over three benchmark datasets: the recent big dataset CrowdPose, the COCO keypoint detection dataset and the MPII Human Pose dataset. Our code will be released upon acceptance.