学习质量意识的代表性多人姿势回归

论文标题

学习质量意识的代表性多人姿势回归

Learning Quality-aware Representation for Multi-person Pose Regression

论文作者

Xiao, Yabo, Yu, Dongdong, Wang, Xiaojuan, Jin, Lei, Wang, Guoli, Zhang, Qian

论文摘要

现成的单阶段多人姿势回归方法通常利用实例分数（即实例定位的信心）表示选择姿势候选者的姿势质量。我们认为现有范式中涉及两个差距：〜1）实例分数与姿势回归质量没有很好的相关。为了解决上述问题，我们建议学习姿势回归质量感知的表示。具体而言，对于第一个差距，而不是使用先前的实例置信标签（例如离散{1,0}或高斯表示形式）来表示对人实例的立场和信心，而是首先将一致的实例表示（CIR）授予实例的姿势回归质量得分，并将其置于pixel-wise-wise wise-wise得分级别的calibibrate youncibibrate youncistress youncibibrate youncistection younciss coldention youncissentions youncissentions之间。要填补第二间隙，我们进一步介绍了查询编码模块（QEM），包括关键点查询编码（KQE），以编码每个关键点的位置和语义信息，以及姿势查询编码（PQE），这些编码（PQE）明确地编码了预测的结构姿势信息，以更好地编码一致的实例实例（CIR）。通过使用所提出的组件，我们大大减轻了上述差距。我们的方法的表现优于先前的基于单阶段回归的甚至自下而上的方法，并在MS Coco Test-DEV集中实现了71.7 AP的最新结果。

Off-the-shelf single-stage multi-person pose regression methods generally leverage the instance score (i.e., confidence of the instance localization) to indicate the pose quality for selecting the pose candidates. We consider that there are two gaps involved in existing paradigm:~1) The instance score is not well interrelated with the pose regression quality.~2) The instance feature representation, which is used for predicting the instance score, does not explicitly encode the structural pose information to predict the reasonable score that represents pose regression quality. To address the aforementioned issues, we propose to learn the pose regression quality-aware representation. Concretely, for the first gap, instead of using the previous instance confidence label (e.g., discrete {1,0} or Gaussian representation) to denote the position and confidence for person instance, we firstly introduce the Consistent Instance Representation (CIR) that unifies the pose regression quality score of instance and the confidence of background into a pixel-wise score map to calibrates the inconsistency between instance score and pose regression quality. To fill the second gap, we further present the Query Encoding Module (QEM) including the Keypoint Query Encoding (KQE) to encode the positional and semantic information for each keypoint and the Pose Query Encoding (PQE) which explicitly encodes the predicted structural pose information to better fit the Consistent Instance Representation (CIR). By using the proposed components, we significantly alleviate the above gaps. Our method outperforms previous single-stage regression-based even bottom-up methods and achieves the state-of-the-art result of 71.7 AP on MS COCO test-dev set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题