A Novel Convolutional Neural Network for Head Detection and Pose Estimation in Complex Environments from Single-Depth Images

Wang, Qi; Lei, Hang; Li, Gun; Wang, Xupeng; Chen, Lu

doi:10.1007/s12559-023-10209-5

A Novel Convolutional Neural Network for Head Detection and Pose Estimation in Complex Environments from Single-Depth Images

Published: 18 November 2023

Volume 16, pages 2116–2129, (2024)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Qi Wang¹,
Hang Lei¹,
Gun Li²,
Xupeng Wang¹ &
…
Lu Chen²

162 Accesses
Explore all metrics

Abstract

Computer vision based on neural networks is an important part of modern cognitive research. As important applications, head detection and pose estimation have made breakthrough progress in recent years. Compared to RGB sensors, depth cameras can provide a reliable solution in unstable or poor lighting conditions. An efficient pose estimation method relies on accurate head centre localization. Based only on depth images, a new convolutional neural network named HDPNet, which implemented complete head detection and pose estimation in complex environments, was proposed. For the head detection part, HDPNet adopted a convolutional neural classification network and the mean shift algorithm to achieve high-precision head centre localization, and for the pose estimation part, a novel guidance network with L2-norm was introduced to constrain the regression process of pose features. Moreover, soft label was adopted to encode the probability distribution between the pose ranges. To verify the performance of HDPNet, a series of experiments were conducted on four challenging public datasets: Watch-n-patch, the Biwi Head Pose dataset, Pandora and ICT-3DHP. Based on our experimental results with a comparison to state-of-the-art methods, the IoU of the head localization was improved by 2.2%, and the mean error in pose estimation was reduced by 0.1. The performance of our HDPNet outperformed the latest methods and could be effectively applied in a real environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Yu Y, Mora KAF, Odobez JM. Robust and accurate 3D head pose estimation through 3DMM and online head model reconstruction[C]. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition. fg IEEE. 2017;2017:711–8.
Google Scholar
**a L, Chen CC, Aggarwal J K. Human detection using depth information by kinect[C]. CVPR 2011 workshops. IEEE; 2011. p. 15-22.
Murphy-Chutorian E, Trivedi MM. Head pose estimation in computer vision: A survey[J]. IEEE Trans Pattern Anal Mach Intell. 2008;31(4):607–26.
Article Google Scholar
Tran C, Trivedi MM. Vision for driver assistance: Looking at people in a vehicle, in Visual Analysis of Humans. Springer; 2011. p. 597–614. 1.
Wang Q, Lei H, Ma X, et al. CNN Network for Head Detection with Depth Images in cyber-physical systems[C]. 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics). IEEE; 2020. p. 544–549.
Borghi G, Fabbri M, Vezzani R, et al. Face-from-depth for head pose estimation on depth images[J]. IEEE Trans Pattern Anal Mach Intell. 2018;42(3):596–609.
Article Google Scholar
Ballotta D, Borghi G, Vezzani R, et al. Head detection with depth images in the wild[J]. ar**v preprint ar**v:1707.06786, 2017.
Ballotta D, Borghi G, Vezzani R, et al. Fully convolutional network for head detection with depth images[C]. 2018 24th International Conference on Pattern Recognition (ICPR). IEEE; 2018. p. 752–757.
Khan M H, Shirahama K, Farid M S, et al. Multiple human detection in depth images[C]. 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP). IEEE; 2016. p. 1–6.
Hsu HW, Wu TY, Wan S, et al. Quatnet: Quaternion-based head pose estimation with multiregression loss[J]. IEEE Trans Multimedia. 2018;21(4):1035–46.
Article Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]. 2005 IEEE Comput Soc Conf Comput Vis Pattern Recognit (CVPR'05). Ieee. 2005;1:886-893.
Thurau C. Behavior histograms for action recognition and human detection[C]//Workshop on Human Motion. Berlin, Heidelberg: Springer; 2007. p. 299–312.
Google Scholar
Yan J, Zhang X, Lei Z, et al. Real-time high performance deformable model for face detection in the wild[C]. 2013 Int Conf Biometrics (ICB). IEEE; 2013. p. 1–6.
Schmidhuber J. Deep learning in neural networks: An overview[J]. Neural Netw. 2015;61:85–117.
Article Google Scholar
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Proc IEEE Conf Comput Vis Pattern Recognit. 2014. p. 580–587.
Vu T H, Osokin A, Laptev I. Context-aware CNNs for person head detection[C]. Proc IEEE Int Conf Comput Vis. 2015. p. 2893–2901.
Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Adv Neural Inf Process Syst. 2015. p. 28.
Chen S, Bremond F, Nguyen H, et al. Exploring depth information for head detection with depth images[C]. 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE; 2016. p. 228–234.
Fanelli G, Gall J, Van Gool L. Real time head pose estimation with random regression forests[C]//CVPR. IEEE. 2011;2011:617–24.
Google Scholar
Ahn B, Park J, Kweon I S. Real-time head orientation from a monocular camera using deep neural network[C]. Asian conference on computer vision. Springer, Cham; 2014. p. 82–96.
Drouard V, Ba S, Evangelidis G, et al. Head pose estimation via probabilistic high-dimensional regression[C]. 2015 IEEE international conference on image processing (ICIP). IEEE; 2015. p. 4624–4628.
Zhu X, Lei Z, Liu X, et al. Face alignment across large poses: A 3d solution[C]. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 146–155.
Patacchiola M, Cangelosi A. Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods[J]. Pattern Recogn. 2017;71:132–43.
Article Google Scholar
Drouard V, Ba S, Horaud R. Switching linear inverse-regression model for tracking head pose[C]. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE; 2017. p. 1232–1240.
Lathuilière S, Juge R, Mesejo P, et al. Deep mixture of linear inverse regressions applied to head-pose estimation[C]. Proc IEEE Conf Comput Vis Pattern Recognit. 2017. p. 4817–4825.
Xu X, Kakadiaris IA. Joint head pose estimation and face alignment framework using global and local CNN features[C]. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE; 2017. p. 642–649.
Liu X, Liang W, Wang Y, et al. 3D head pose estimation with convolutional neural network trained on synthetic images[C]. 2016 IEEE international conference on image processing (ICIP). IEEE; 2016. p. 1289–1293.
Khan K, Mauro M, Migliorati P, et al. Head pose estimation through multi-class face segmentation[C]. 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE; 2017. p. 175–180.
Ruiz N, Chong E, Rehg J M. Fine-grained head pose estimation without keypoints[C]. Proc IEEE Conf Comput Vis Pattern Recognit Workshops. 2018. p. 2074–2083.
Yang J, Liang W, Jia Y. Face pose estimation with combined 2d and 3d hog features[C]. Proc 21st Int Conf Pattern Recognit (ICPR2012). IEEE; 2012. p. 2492–2495.
Mukherjee SS, Robertson NM. Deep head pose: Gaze-direction estimation in multimodal video[J]. IEEE Trans Multimedia. 2015;17(11):2094–107.
Article Google Scholar
Li S, Ngan KN, Paramesran R, et al. Real-time head pose tracking with online face template reconstruction[J]. IEEE Trans Pattern Anal Mach Intell. 2015;38(9):1922–8.
Article Google Scholar
Malassiotis S, Strintzis MG. Robust real-time 3D head pose estimation from range data[J]. Pattern Recogn. 2005;38(8):1153–65.
Article Google Scholar
Breitenstein M D, Kuettel D, Weise T, et al. Real-time face pose estimation from single range images[C]. 2008 IEEE Conf Comput Vis Pattern Recognit. IEEE; 2008. p. 1–8.
Padeleris P, Zabulis X, Argyros A A. Head pose estimation on depth data based on particle swarm optimization[C]. 2012 IEEE Comput Soc Conf Comput Vis Pattern Recognit Workshops. IEEE; 2012. p. 42–49.
Papazov C, Marks T K, Jones M. Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features[C]. Proc IEEE Conf Comput Vis Pattern Recognit. 2015. p. 4722–4730.
Sheng L, Cai J, Cham TJ, et al. A generative model for depth-based robust 3D facial pose tracking[C]. Proc IEEE Conf Comput Vis Pattern Recognit. 2017. p. 4488–4497.
Venturelli M, Borghi G, Vezzani R, et al. From depth data to head pose estimation: a siamese approach[J]. ar**v preprint ar**v:1703.03624. 2017.
Shihua X, Nan S, Xupeng W. 3D point cloud head pose estimation based on deep learning[J]. Journal of Computer Applications. 2020;40(4):996.
Google Scholar
Ma X, Sang N, **ao S, et al. Learning a Deep Regression Forest for Head Pose Estimation from a Single Depth Image[J]. J Circuits Syst Comput. 2021;30(08):2150139.
Article Google Scholar
Comaniciu D, Meer P. Mean shift: A robust approach toward feature space analysis[J]. IEEE Trans Pattern Anal Mach Intell. 2002;24(5):603–19.
Article Google Scholar
Ranjan R, Castillo CD, Chellappa R. L2-constrained softmax loss for discriminative face verification[J]. ar**v preprint ar**v:1703.09507, 2017.
**ao S, Sang N, Wang X, et al. Leveraging ordinal regression with soft labels for 3d head pose estimation from point sets[C]. ICASSP 2020–2020 IEEE Int Conf Acoust Speech Signal Process (ICASSP). IEEE; 2020. p. 1883–1887.
Diaz R, Marathe A. Soft labels for ordinal regression[C]. Proc IEEE/CVF Conf Comput Vis Pattern Recognit. 2019. p. 4738–4747.
Wu C, Zhang J, Savarese S, et al. Watch-n-patch: Unsupervised understanding of actions and relations[C]. Proc IEEE Conf Comput Vis Pattern Recognit. 2015. p. 4362–4370.
Baltrušaitis T, Robinson P, Morency LP. 3D constrained local model for rigid and non-rigid facial tracking[C]. 2012 IEEE Conf Comput Vis Pattern Recognit. IEEE; 2012. 2610–2617
Fathian K, Ramirez-Paredes JP, Doucette EA, et al. Quest: A quaternion-based approach for camera motion estimation from minimal feature points[J]. IEEE Robot Autom Lett. 2018;3(2):857–64.
Article Google Scholar
Wang Q, Lei H, Qian W. Siamese PointNet: 3D Head Pose Estimation with Local Feature Descriptor[J]. Electronics. 2023;12(5):1194.
Article Google Scholar

Download references

Funding

This research is funded by the National Natural Science Foundation of China (61802052), the Innovative Research Foundation of Ship General Performance (26422206), and the Sichuan Science and Technology Program (2023YFSY0040).

Author information

Authors and Affiliations

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China
Qi Wang, Hang Lei & Xupeng Wang
School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu, 611731, China
Gun Li & Lu Chen

Authors

Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hang Lei
View author publications
You can also search for this author in PubMed Google Scholar
Gun Li
View author publications
You can also search for this author in PubMed Google Scholar
Xupeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gun Li.

Ethics declarations

Ethics Approval

This article does not contain any studies that used human participants or animals.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Q., Lei, H., Li, G. et al. A Novel Convolutional Neural Network for Head Detection and Pose Estimation in Complex Environments from Single-Depth Images. Cogn Comput 16, 2116–2129 (2024). https://doi.org/10.1007/s12559-023-10209-5

Download citation

Received: 19 September 2022
Accepted: 01 October 2023
Published: 18 November 2023
Issue Date: July 2024
DOI: https://doi.org/10.1007/s12559-023-10209-5

Keyword

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Convolutional Neural Network for Head Detection and Pose Estimation in Complex Environments from Single-Depth Images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

A review of convolutional neural networks in computer vision

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keyword

Subscribe and save

Buy Now

Navigation

A Novel Convolutional Neural Network for Head Detection and Pose Estimation in Complex Environments from Single-Depth Images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

A review of convolutional neural networks in computer vision

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keyword

Subscribe and save

Buy Now

Search

Navigation