Abstract
Human pose estimation of vulnerable road users is an important perception task for autonomous vehicles which can be exploited for intention prediction in order to guide the vehicle’s actions. Single-stage human pose estimation approaches with their potential in terms of simplicity and efficiency have shown only mediocre results in 2D, and have hardly been investigated in 3D in the autonomous driving domain so far. We tackle this challenge with the 2D single-stage human pose estimator KAPAO. We find that KAPAO achieves state-of-the-art performance in our evaluation on domain-specific 2D benchmark datasets, which motivates its extension for application in 3D. To overcome a lack of ground truth vulnerable road user data for 3D pose estimation, we first extend the Waymo Open Dataset with additional 3D pseudo-labels. We create more than one million 3D poses, that we estimate using the dataset’s exhaustive person bounding boxes and associated LiDAR point clouds. Evaluating their quality, we report a mean per joint position error of less than 10 cm. Having access to large-scale domain-specific 3D pose data, we propose a 3D variant of KAPAO that additionally predicts the depths of joints. We evaluate it on our extended Waymo Open Dataset and compare its performance to that of a LiDAR uplifting baseline. The proposed approach is low-latency and produces plausible poses but struggles to estimate absolute depth precisely, particularly at large distances. We alleviate that limitation by implementing a conditional LiDAR-based depth correction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bdd100k model zoo - pose estimation models of bdd100k. https://github.com/SysCV/bdd100k-models/tree/main/pose. Accessed 15 Nov 2022
UrbanPose: a new benchmark for VRU pose estimation in urban traffic scenes - leaderboard. https://urbanpose-dataset.com/info/Datasets/198. Accessed 15 Nov 2022
Brasó, G., Kister, N., Leal-Taixé, L.: The center of attention: center-keypoint grou** via attention for multi-person pose estimation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11833–11843 (2021). https://doi.org/10.1109/ICCV48922.2021.01164
Cadena, P.R.G., Yang, M., Qian, Y., Wang, C.: Pedestrian graph: pedestrian crossing prediction based on 2D pose estimation and graph convolutional networks. In: IEEE Intelligent Transportation Systems Conference (ITSC), pp. 2000–2005 (2019). https://doi.org/10.1109/ITSC.2019.8917118
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7103–7112 (2018). https://doi.org/10.1109/CVPR.2018.00742
Cheng, B., **ao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: HigherHRNet: scale-aware representation learning for bottom-up human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5385–5394 (2020). https://doi.org/10.1109/CVPR42600.2020.00543
Fabbri, M., Lanzi, F., Calderara, S., Alletto, S., Cucchiara, R.: Compressed volumetric heatmaps for multi-person 3d pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7202–7211 (2020). https://doi.org/10.1109/CVPR42600.2020.00723
Fang, H.S., **e, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2353–2362 (2017). https://doi.org/10.1109/ICCV.2017.256
Fang, Z., Zhang, W., Guo, Z., Zhi, R., Wang, B., Flohr, F.: Traffic police gesture recognition by pose graph convolutional networks. In: IEEE Intelligent Vehicles Symposium (IV), pp. 1833–1838 (2020). https://doi.org/10.1109/IV47402.2020.9304675
Fürst, M., Gupta, S.T.P., Schuster, R., Wasenmüller, O., Stricker, D.: HPERL: 3D human pose estimation from RGB and LiDAR. In: 25th International Conference on Pattern Recognition (ICPR), pp. 7321–7327 (2021). https://doi.org/10.1109/ICPR48806.2021.9412785
Geng, Z., Sun, K., **ao, B., Zhang, Z., Wang, J.: Bottom-up human pose estimation via disentangled keypoint regression. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14676–14686 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Huang, J., Zhu, Z., Guo, F., Huang, G.: The devil is in the details: delving into unbiased data processing for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5699–5708 (2020). https://doi.org/10.1109/CVPR42600.2020.00574
**, L., Xu, C., Wang, X., **ao, Y., Guo, Y., Nie, X., Zhao, J.: Single-stage is enough: multi-person absolute 3d pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13076–13085 (2022). https://doi.org/10.1109/CVPR52688.2022.01274
Jocher, G., et al.: ultralytics/YOLOv5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference (2022). https://doi.org/10.5281/zenodo.6222936
Kreiss, S., Bertoni, L., Alahi, A.: PifPaf: composite fields for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11969–11978 (2019). https://doi.org/10.1109/CVPR.2019.01225
Kumar, C., et al.: VRU Pose-SSD: Multiperson pose estimation for automated driving. In: AAAI Conference on Artificial Intelligence, pp. 15331–15338 (2021). https://doi.org/10.1609/aaai.v35i17.17800
Liang, J., Jiang, L., Niebles, J.C., Hauptmann, A.G., Fei-Fei, L.: Peeking into the future: Predicting future person activities and locations in videos. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5718–5727 (2019). https://doi.org/10.1109/CVPR.2019.00587
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Luo, Z., Wang, Z., Huang, Y., Wang, L., Tan, T., Zhou, E.: Rethinking the heatmap regression for bottom-up human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13259–13268 (2021). https://doi.org/10.1109/CVPR46437.2021.01306
Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5137–5146 (2018). https://doi.org/10.1109/CVPR.2018.00539
Luvizon, D.C., Tabia, H., Picard, D.: Human pose regression by combining indirect part detection and contextual information. Comput. Graph. 85, 15–22 (2019). https://doi.org/10.1016/j.cag.2019.09.002
Maji, D., Nagori, S., Mathew, M., Poddar, D.: YOLO-POSE: enhancing yolo for multi person pose estimation using object keypoint similarity loss. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2636–2645 (2022). https://doi.org/10.1109/CVPRW56347.2022.00297
Mauri, A., Khemmar, R., Decoux, B., Haddad, M., Boutteau, R.: Lightweight convolutional neural network for real-time 3D object detection in road and railway environments. J. Real-Time Image Proc. 19(3), 499–516 (2022). https://doi.org/10.1007/s11554-022-01202-6
McNally, W., Vats, K., Wong, A., McPhee, J.: Rethinking keypoint representations: modeling keypoints and poses as objects for multi-person human pose estimation. In: Avidan, S., Brostow, G., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13666. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_3
Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10132–10141 (2019). https://doi.org/10.1109/ICCV.2019.01023
Nie, X., Feng, J., Zhang, J., Yan, S.: Single-stage multi-person pose machines. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6950–6959 (2019). https://doi.org/10.1109/ICCV.2019.00705
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1263–1272 (2017). https://doi.org/10.1109/CVPR.2017.139
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018). https://doi.org/10.1109/CVPR.2018.00474
Sun, K., **ao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5686–5696 (2019). https://doi.org/10.1109/CVPR.2019.00584
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2443–2451 (2020). https://doi.org/10.1109/CVPR42600.2020.00252
Sun, X., **ao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33
Wang, S., et al.: Leverage of limb detection in pose estimation for vulnerable road users. In: IEEE Intelligent Transportation Systems Conference (ITSC), pp. 528–534 (2019). https://doi.org/10.1109/ITSC.2019.8917065
Wang, S., et al.: UrbanPose: A new benchmark for VRU pose estimation in urban traffic scenes. In: IEEE Intelligent Vehicles Symposium (IV), pp. 1537–1544 (2021). https://doi.org/10.1109/IV48863.2021.9575469
Wang, Z., Nie, X., Qu, X., Chen, Y., Liu, S.: Distribution-aware single-stage models for multi-person 3d pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13086–13095 (2022). https://doi.org/10.1109/CVPR52688.2022.01275
Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2633–2642 (2020). https://doi.org/10.1109/CVPR42600.2020.00271
Zanfir, A., Marinoiu, E., Zanfir, M., Popa, A.I., Sminchisescu, C.: Deep network for the integrated 3d sensing of multiple people in natural images. In: 32nd International Conference on Neural Information Processing Systems (NeurIPS), pp. 8420–8429 (2018). https://doi.org/10.5555/3327757.3327933
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7091–7100 (2020). https://doi.org/10.1109/CVPR42600.2020.00712
Zheng, J., et al.: Multi-modal 3D human pose estimation with 2d weak supervision in autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4477–4486 (2022). https://doi.org/10.1109/CVPRW56347.2022.00494
Acknowledgement
This work was partly supported by the SmartProtect project (no. 879642), which is funded through the Austrian Research Promotion Agency (FFG) on behalf of the Austrian Ministry of Climate Action (BMK) via its Mobility of the Future funding program, and the European Union’s H2020 Fast Track to Innovation project SmartRCS (no. 971619).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Windbacher, F., Hödlmoser, M., Gelautz, M. (2023). Single-Stage 3D Pose Estimation of Vulnerable Road Users Using Pseudo-Labels. In: Gade, R., Felsberg, M., Kämäräinen, JK. (eds) Image Analysis. SCIA 2023. Lecture Notes in Computer Science, vol 13886. Springer, Cham. https://doi.org/10.1007/978-3-031-31438-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-31438-4_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31437-7
Online ISBN: 978-3-031-31438-4
eBook Packages: Computer ScienceComputer Science (R0)