open access publication

Conference Paper, 2024

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), ISBN 979-8-3503-1892-0, Volume 00, Pages 6130-6140, 10.1109/wacv57701.2024.00603

Contributors

Jiang, Zhongyu 0000-0003-4462-6497 [1] Zhou, Zhuoran [1] Li, Lei (Corresponding author) [2] Chai, Wenhao 0000-0003-2611-0008 [1] Yang, Cheng-Yen [1] Hwang, Jenq-Neng 0000-0002-8877-2421 [1]

Affiliations

  1. [1] University of Washington
  2. [NORA names: United States; America, North; OECD];
  3. [2] University of Copenhagen
  4. [NORA names: KU University of Copenhagen; University; Denmark; Europe, EU; Nordic; OECD]

Abstract

Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE 51.4mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 40.3mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW. Our code is available here: https://github.com/ipl-uw/ZeDO-Release.

Keywords

HPE, PA-MPJPE, Zedo, average, benchmarks, camera intrinsic parameters, case-by-case, code, cross-dataset evaluation, cross-domain, dataset, diffusion-based method, distribution, estimated pose, estimation, evaluation, human pose, human pose estimation, intrinsic parameters, learning-based methods, learning-based models, lift, method, model, network, optimization, optimization-based, optimization-based method, pairs, parameters, performance, pose, pose estimation, problem, state-of-the-art, state-of-the-art performance, statistical average, traditional optimization-based methods, trained network, training, wild

Data Provider: Digital Science