Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows

Andrei Zanfir
Hongyi Xu
European Conference on Computer Vision (ECCV) (2020), pp. 465-481

Abstract

Monocular 3D human pose and shape estimation is challenging due to the many degrees of freedom of the human body and thedifficulty to acquire training data for large-scale supervised learning incomplex visual scenes. In this paper we present practical semi-supervisedand self-supervised models that support training and good generalizationin real-world images and video. Our formulation is based on kinematiclatent normalizing flow representations and dynamics, as well as differ-entiable, semantic body part alignment loss functions that support self-supervised learning. In extensive experiments using 3D motion capturedatasets like CMU, Human3.6M, 3DPW, or AMASS, as well as imagerepositories like COCO, we show that the proposed methods outperformthe state of the art, supporting the practical construction of an accuratefamily of models based on large-scale training with diverse and incom-pletely labeled image and video data.