Hierarchical global and local transformer for pain estimation with facial expression videos

Liu, Hongrui; Xu, Haochen; Qiu, **heng; Wu, Shizhe; Liu, Manhua

doi:10.1007/s10044-024-01302-y

Hierarchical global and local transformer for pain estimation with facial expression videos

Original Article
Published: 15 July 2024

Volume 27, article number 85, (2024)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Hongrui Liu^1,2,
Haochen Xu^1,2,
**heng Qiu³,
Shizhe Wu⁴ &
…
Manhua Liu^1,2

Abstract

Automatic pain intensity estimation from facial expression analysis has important applications in medical and healthcare areas. Most of the existing works tend to directly transfer the typical face recognition models to pain estimation task, which may not obtain good performances because the facial expression of pain is spontaneous with subtle facial variations. Pain estimation from facial video is still challenging because it relies on modeling semantic parts and extraction of fine-grained and dynamic features. In this study, we propose a hierarchical global and local transformer (HGLT) model for pain estimation from facial expression videos. HGLT model consists of an image frame embedding subnetwork and a temporal embedding subnetwork for extraction of spatio-temporal features. In the frame embedding subnetwork, we propose a multi-head local attention mechanism to extract the local fine-grained features related to micro variations of pain, followed by a hierarchical self-attention pooling to integrate the global and local features. In the temporal embedding subnetwork, a transformer encoder with temporal attention is proposed to model the temporal relationships of video frames and capture the dynamic facial variations. A correlation loss is proposed to alleviate the problem of long-tailed imbalance in the distribution of pain intensities. Our proposed method is tested with UNBC-McMaster Shoulder Pain, BioVid Heart Pain dataset, and DAiSEE dataset. Experimental results indicate that our method achieves competitive performances compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

The datasets generated and/or analyzed during the current study are available in the https://sites.pitt.edu/\(\sim \)emotion/um-spread.htm, https://www.nit.ovgu.de/BioVid.html, and https://people.iith.ac.in/vineethnb/resou rces/daisee/index.html.

References

Morone NE, Weiner DK (2013) Pain as the fifth vital sign: exposing the vital need for pain education. Clin Ther 35(11):1728–1732
Article Google Scholar
Dehghani H, Tavangar H, Ghandehari A (2014) Validity and reliability of behavioral pain scale in patients with low level of consciousness due to head trauma hospitalized in intensive care unit. Archives of trauma research 3(1)
Achterberg WP, Pieper MJ, Dalen-Kok AH, De Waal MW, Husebo BS, Lautenbacher S, Kunz M, Scherder EJ, Corbett A (2013) Pain management in patients with dementia. Clin Interv Aging 8:1471
Article Google Scholar
Werner P, Al-Hamadi A, Niese R, Walter S, Gruss S, Traue HC (2014) Automatic pain recognition from video and biomedical signals. In: 2014 22nd International Conference on Pattern Recognition, pp. 4582–4587. IEEE
Bunk SF, Lautenbacher S, Rüsseler J, Müller K, Schultz J, Kunz M (2018) Does eeg activity during painful stimulation mirror more closely the noxious stimulus intensity or the subjective pain sensation? Somatosens Motor Res 35(3–4):192–198
Article Google Scholar
Nickel MM, May ES, Tiemann L, Schmidt P, Postorino M, Dinh ST, Gross J, Ploner M (2017) Brain oscillations differentially encode noxious stimulus intensity and pain intensity. Neuroimage 148:141–147
Article Google Scholar
Sheu E, Versloot J, Nader R, Kerr D, Craig KD (2011) Pain in the elderly: validity of facial expression components of observational measures. Clin J Pain 27(7):593–601
Article Google Scholar
Ekman P, Friesen WV (1978) Facial action coding system. Environmental Psychology & Nonverbal Behavior
Kunz M, Meixner D, Lautenbacher S (2019) Facial muscle movements encoding pain-a systematic review. Pain 160(3):535–549
Article Google Scholar
Prkachin KM, Solomon PE (2008) The structure, reliability and validity of pain expression: evidence from patients with shoulder pain. Pain 139(2):267–274
Article Google Scholar
Lucey P, Cohn JF, Prkachin KM, Solomon PE, Matthews I (2011) Painful data: The unbc-mcmaster shoulder pain expression archive database. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 57–64. IEEE
Wang J, Sun H (2018) Pain intensity estimation using deep spatiotemporal and handcrafted features. IEICE Trans Inf Syst 101(6):1572–1580
Article Google Scholar
Zafar Z, Khan NA (2014) Pain intensity evaluation through facial action units. In: 2014 22nd International Conference on Pattern Recognition, pp. 4696–4701. IEEE
Meng H, Bianchi-Berthouze N (2013) Affective state level recognition in naturalistic facial and vocal expressions. IEEE Trans Cybern 44(3):315–328
Article Google Scholar
Zhao R, Gan Q, Wang S, Ji Q (2016) Facial expression intensity estimation using ordinal information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3466–3474
Li H, Wang N, Yang X, Gao X (2022) Crs-cont: a well-trained general encoder for facial expression analysis. IEEE Trans Image Process 31:4637–4650
Article Google Scholar
Li H, Wang N, Yang X, Wang X, Gao X (2022) Towards semi-supervised deep facial expression recognition with an adaptive confidence margin. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4166–4175
Li H, Wang N, Ding X, Yang X, Gao X (2021) Adaptively learning facial expression representation via cf labels and distillation. IEEE Trans Image Process 30:2016–2028
Article Google Scholar
Rodriguez P, Cucurull G, Gonzàlez J, Gonfaus JM, Nasrollahi K, Moeslund TB, Roca FX (2017) Deep pain: exploiting long short-term memory networks for facial expression classification. IEEE transactions on cybernetics
Tavakolian M, Hadid A (2018) Deep spatiotemporal representation of the face for automatic pain intensity estimation. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 350–354. IEEE
Huang D, **a Z, Li L, Ma Y (2023) Pain estimation with integrating global-wise and region-wise convolutional networks. IET Image Proc 17(3):637–648
Article Google Scholar
Martinez DL, Rudovic O, Picard R (2017) Personalized automatic estimation of self-reported pain intensity from facial expressions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2318–2327. IEEE
Tavakolian M, Hadid A (2019) A spatiotemporal convolutional neural network for automatic pain intensity estimation from facial dynamics. Int J Comput Vision 127(10):1413–1425
Article Google Scholar
Huang D, **a Z, Mwesigye J, Feng X (2020) Pain-attentive network: a deep spatio-temporal attention model for pain estimation. Multimed Tools Appl 79(37):28329–28354
Article Google Scholar
Huang D, Feng X, Zhang H, Yu Z, Peng J, Zhao G, **a Z (2021) Spatio-temporal pain estimation network with measuring pseudo heart rate gain. IEEE Trans Multimed 24:3300–3313
Article Google Scholar
Zhao Z, Liu Q (2021) Former-dfer: dynamic facial expression recognition transformer. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1553–1561
Liao J, Hao Y, Zhou Z, Pan J, Liang Y (2024) Sequence-level affective level estimation based on pyramidal facial expression features. Pattern Recogn 145:109958
Article Google Scholar
Praveen RG, Granger E, Cardinal P (2020) Deep weakly supervised domain adaptation for pain localization in videos. In: 2020 15th IEEE International conference on automatic face and gesture recognition (FG 2020), pp. 473–480. IEEE
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141
Wang Q, Wu T, Zheng H, Guo G (2020) Hierarchical pyramid diverse attention networks for face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8326–8335
Walter S, Gruss S, Ehleiter H, Tan J, Traue HC, Werner P, Al-Hamadi A, Crawcour S, Andrade AO, Silva GM (2013) The biovid heat pain database data for the advancement and systematic validation of an automated pain recognition system. In: 2013 IEEE international conference on cybernetics (CYBCO), pp. 128–131. IEEE
Gupta A, D’Cunha A, Awasthi K, Balasubramanian V (2016) Daisee: towards user engagement recognition in the wild. ar**v preprint ar**v:1609.01885
Mehta NK, Prasad SS, Saurav S, Saini R, Singh S (2022) Three-dimensional densenet self-attention neural network for automatic detection of student’s engagement. Appl Intell 52(12):13803–13823
Article Google Scholar
Cootes TF, Edwards GJ, Taylor CJ (1998) Active appearance models. In: European conference on computer vision, pp. 484–498. Springer
Cubuk ED, Zoph B, Shlens J, Le QV (2020) Randaugment: Practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops, pp. 702–703
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. Proceedings of the AAAI conference on artificial intelligence 34:13001–13008
Jiang X, Zong Y, Zheng W, Tang C, **a W, Lu C, Liu J (2020) Dfew: A large-scale database for recognizing dynamic facial expressions in the wild. In: Proceedings of the 28th ACM international conference on multimedia, pp. 2881–2889
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. ar**v preprint ar**v:1711.05101
Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE
Wang Y, Bilinski P, Bremond F, Dantcheva A (2020) Imaginator: Conditional spatio-temporal gan for video generation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1160–1169
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, 618–626
Wang F, **ang X, Liu C, Tran TD, Reiter A, Hager GD, Quon H, Cheng J, Yuille AL (2017) Regularizing face verification nets for pain intensity regression. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 1087–1091. IEEE
Huang Y, Qing L, Xu S, Wang L, Peng Y (2022) Hybnet: a hybrid network structure for pain intensity estimation. The Visual Computer, 1–12
Melo WC, Granger E, Lopez MB (2024) Facial expression analysis using decomposed multiscale spatiotemporal networks. Expert Syst Appl 236:121276
Article Google Scholar
Rajasekhar GP, Granger E, Cardinal P (2021) Deep domain adaptation with ordinal regression for pain assessment using weakly-labeled videos. Image Vis Comput 110:104167
Article Google Scholar
Liao J, Liang Y, Pan J (2021) Deep facial spatiotemporal network for engagement prediction in online learning. Appl Intell 51(10):6609–6621
Article Google Scholar

Download references

Funding

This research was funded in part by the National Natural Science Foundation of China (62171283), the National key research and development program of China (2022YFC2503305 and 2022YFC2503302), Shanghai Jiao Tong University fund (YG2023QNB27), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), the Fundamental Research Funds for Central Universities.

Author information

Authors and Affiliations

MoE Key Laboratory of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, China
Hongrui Liu, Haochen Xu & Manhua Liu
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Hongrui Liu, Haochen Xu & Manhua Liu
Shanghai American School Pudong Campus, Shanghai, 201206, China
**heng Qiu
University of California, Davis, 95616, USA
Shizhe Wu

Authors

Hongrui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haochen Xu
View author publications
You can also search for this author in PubMed Google Scholar
**heng Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Shizhe Wu
View author publications
You can also search for this author in PubMed Google Scholar
Manhua Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manhua Liu.

Ethics declarations

Ethical approval

This study was approved by the Ethics Committee of the UNBC-McMaster Shoulder Pain Expression Archive Database and BioVid Heat Pain Database after signing and returning an agreement form available from https://sites.pitt.edu/\(\sim \)emotion/um-spread.htm and https://www.nit.ovgu.de/BioVid.html. All participants provided written informed consent before taking part in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 99 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, H., Xu, H., Qiu, J. et al. Hierarchical global and local transformer for pain estimation with facial expression videos. Pattern Anal Applic 27, 85 (2024). https://doi.org/10.1007/s10044-024-01302-y

Download citation

Received: 11 October 2023
Accepted: 03 July 2024
Published: 15 July 2024
DOI: https://doi.org/10.1007/s10044-024-01302-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical global and local transformer for pain estimation with facial expression videos

Abstract

Access this article

Subscribe and save

Buy Now

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 99 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation