Log in

Hierarchical global and local transformer for pain estimation with facial expression videos

  • Original Article
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Automatic pain intensity estimation from facial expression analysis has important applications in medical and healthcare areas. Most of the existing works tend to directly transfer the typical face recognition models to pain estimation task, which may not obtain good performances because the facial expression of pain is spontaneous with subtle facial variations. Pain estimation from facial video is still challenging because it relies on modeling semantic parts and extraction of fine-grained and dynamic features. In this study, we propose a hierarchical global and local transformer (HGLT) model for pain estimation from facial expression videos. HGLT model consists of an image frame embedding subnetwork and a temporal embedding subnetwork for extraction of spatio-temporal features. In the frame embedding subnetwork, we propose a multi-head local attention mechanism to extract the local fine-grained features related to micro variations of pain, followed by a hierarchical self-attention pooling to integrate the global and local features. In the temporal embedding subnetwork, a transformer encoder with temporal attention is proposed to model the temporal relationships of video frames and capture the dynamic facial variations. A correlation loss is proposed to alleviate the problem of long-tailed imbalance in the distribution of pain intensities. Our proposed method is tested with UNBC-McMaster Shoulder Pain, BioVid Heart Pain dataset, and DAiSEE dataset. Experimental results indicate that our method achieves competitive performances compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Data availability

The datasets generated and/or analyzed during the current study are available in the https://sites.pitt.edu/\(\sim \)emotion/um-spread.htm, https://www.nit.ovgu.de/BioVid.html, and https://people.iith.ac.in/vineethnb/resou rces/daisee/index.html.

References

  1. Morone NE, Weiner DK (2013) Pain as the fifth vital sign: exposing the vital need for pain education. Clin Ther 35(11):1728–1732

    Article  Google Scholar 

  2. Dehghani H, Tavangar H, Ghandehari A (2014) Validity and reliability of behavioral pain scale in patients with low level of consciousness due to head trauma hospitalized in intensive care unit. Archives of trauma research 3(1)

  3. Achterberg WP, Pieper MJ, Dalen-Kok AH, De Waal MW, Husebo BS, Lautenbacher S, Kunz M, Scherder EJ, Corbett A (2013) Pain management in patients with dementia. Clin Interv Aging 8:1471

    Article  Google Scholar 

  4. Werner P, Al-Hamadi A, Niese R, Walter S, Gruss S, Traue HC (2014) Automatic pain recognition from video and biomedical signals. In: 2014 22nd International Conference on Pattern Recognition, pp. 4582–4587. IEEE

  5. Bunk SF, Lautenbacher S, Rüsseler J, Müller K, Schultz J, Kunz M (2018) Does eeg activity during painful stimulation mirror more closely the noxious stimulus intensity or the subjective pain sensation? Somatosens Motor Res 35(3–4):192–198

    Article  Google Scholar 

  6. Nickel MM, May ES, Tiemann L, Schmidt P, Postorino M, Dinh ST, Gross J, Ploner M (2017) Brain oscillations differentially encode noxious stimulus intensity and pain intensity. Neuroimage 148:141–147

    Article  Google Scholar 

  7. Sheu E, Versloot J, Nader R, Kerr D, Craig KD (2011) Pain in the elderly: validity of facial expression components of observational measures. Clin J Pain 27(7):593–601

    Article  Google Scholar 

  8. Ekman P, Friesen WV (1978) Facial action coding system. Environmental Psychology & Nonverbal Behavior

  9. Kunz M, Meixner D, Lautenbacher S (2019) Facial muscle movements encoding pain-a systematic review. Pain 160(3):535–549

    Article  Google Scholar 

  10. Prkachin KM, Solomon PE (2008) The structure, reliability and validity of pain expression: evidence from patients with shoulder pain. Pain 139(2):267–274

    Article  Google Scholar 

  11. Lucey P, Cohn JF, Prkachin KM, Solomon PE, Matthews I (2011) Painful data: The unbc-mcmaster shoulder pain expression archive database. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 57–64. IEEE

  12. Wang J, Sun H (2018) Pain intensity estimation using deep spatiotemporal and handcrafted features. IEICE Trans Inf Syst 101(6):1572–1580

    Article  Google Scholar 

  13. Zafar Z, Khan NA (2014) Pain intensity evaluation through facial action units. In: 2014 22nd International Conference on Pattern Recognition, pp. 4696–4701. IEEE

  14. Meng H, Bianchi-Berthouze N (2013) Affective state level recognition in naturalistic facial and vocal expressions. IEEE Trans Cybern 44(3):315–328

    Article  Google Scholar 

  15. Zhao R, Gan Q, Wang S, Ji Q (2016) Facial expression intensity estimation using ordinal information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3466–3474

  16. Li H, Wang N, Yang X, Gao X (2022) Crs-cont: a well-trained general encoder for facial expression analysis. IEEE Trans Image Process 31:4637–4650

    Article  Google Scholar 

  17. Li H, Wang N, Yang X, Wang X, Gao X (2022) Towards semi-supervised deep facial expression recognition with an adaptive confidence margin. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4166–4175

  18. Li H, Wang N, Ding X, Yang X, Gao X (2021) Adaptively learning facial expression representation via cf labels and distillation. IEEE Trans Image Process 30:2016–2028

    Article  Google Scholar 

  19. Rodriguez P, Cucurull G, Gonzàlez J, Gonfaus JM, Nasrollahi K, Moeslund TB, Roca FX (2017) Deep pain: exploiting long short-term memory networks for facial expression classification. IEEE transactions on cybernetics

  20. Tavakolian M, Hadid A (2018) Deep spatiotemporal representation of the face for automatic pain intensity estimation. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 350–354. IEEE

  21. Huang D, **a Z, Li L, Ma Y (2023) Pain estimation with integrating global-wise and region-wise convolutional networks. IET Image Proc 17(3):637–648

    Article  Google Scholar 

  22. Martinez DL, Rudovic O, Picard R (2017) Personalized automatic estimation of self-reported pain intensity from facial expressions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2318–2327. IEEE

  23. Tavakolian M, Hadid A (2019) A spatiotemporal convolutional neural network for automatic pain intensity estimation from facial dynamics. Int J Comput Vision 127(10):1413–1425

    Article  Google Scholar 

  24. Huang D, **a Z, Mwesigye J, Feng X (2020) Pain-attentive network: a deep spatio-temporal attention model for pain estimation. Multimed Tools Appl 79(37):28329–28354

    Article  Google Scholar 

  25. Huang D, Feng X, Zhang H, Yu Z, Peng J, Zhao G, **a Z (2021) Spatio-temporal pain estimation network with measuring pseudo heart rate gain. IEEE Trans Multimed 24:3300–3313

    Article  Google Scholar 

  26. Zhao Z, Liu Q (2021) Former-dfer: dynamic facial expression recognition transformer. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1553–1561

  27. Liao J, Hao Y, Zhou Z, Pan J, Liang Y (2024) Sequence-level affective level estimation based on pyramidal facial expression features. Pattern Recogn 145:109958

    Article  Google Scholar 

  28. Praveen RG, Granger E, Cardinal P (2020) Deep weakly supervised domain adaptation for pain localization in videos. In: 2020 15th IEEE International conference on automatic face and gesture recognition (FG 2020), pp. 473–480. IEEE

  29. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141

  30. Wang Q, Wu T, Zheng H, Guo G (2020) Hierarchical pyramid diverse attention networks for face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8326–8335

  31. Walter S, Gruss S, Ehleiter H, Tan J, Traue HC, Werner P, Al-Hamadi A, Crawcour S, Andrade AO, Silva GM (2013) The biovid heat pain database data for the advancement and systematic validation of an automated pain recognition system. In: 2013 IEEE international conference on cybernetics (CYBCO), pp. 128–131. IEEE

  32. Gupta A, D’Cunha A, Awasthi K, Balasubramanian V (2016) Daisee: towards user engagement recognition in the wild. ar**v preprint ar**v:1609.01885

  33. Mehta NK, Prasad SS, Saurav S, Saini R, Singh S (2022) Three-dimensional densenet self-attention neural network for automatic detection of student’s engagement. Appl Intell 52(12):13803–13823

    Article  Google Scholar 

  34. Cootes TF, Edwards GJ, Taylor CJ (1998) Active appearance models. In: European conference on computer vision, pp. 484–498. Springer

  35. Cubuk ED, Zoph B, Shlens J, Le QV (2020) Randaugment: Practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops, pp. 702–703

  36. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. Proceedings of the AAAI conference on artificial intelligence 34:13001–13008

  37. Jiang X, Zong Y, Zheng W, Tang C, **a W, Lu C, Liu J (2020) Dfew: A large-scale database for recognizing dynamic facial expressions in the wild. In: Proceedings of the 28th ACM international conference on multimedia, pp. 2881–2889

  38. Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. ar**v preprint ar**v:1711.05101

  39. Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE

  40. Wang Y, Bilinski P, Bremond F, Dantcheva A (2020) Imaginator: Conditional spatio-temporal gan for video generation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1160–1169

  41. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, 618–626

  42. Wang F, **ang X, Liu C, Tran TD, Reiter A, Hager GD, Quon H, Cheng J, Yuille AL (2017) Regularizing face verification nets for pain intensity regression. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 1087–1091. IEEE

  43. Huang Y, Qing L, Xu S, Wang L, Peng Y (2022) Hybnet: a hybrid network structure for pain intensity estimation. The Visual Computer, 1–12

  44. Melo WC, Granger E, Lopez MB (2024) Facial expression analysis using decomposed multiscale spatiotemporal networks. Expert Syst Appl 236:121276

    Article  Google Scholar 

  45. Rajasekhar GP, Granger E, Cardinal P (2021) Deep domain adaptation with ordinal regression for pain assessment using weakly-labeled videos. Image Vis Comput 110:104167

    Article  Google Scholar 

  46. Liao J, Liang Y, Pan J (2021) Deep facial spatiotemporal network for engagement prediction in online learning. Appl Intell 51(10):6609–6621

    Article  Google Scholar 

Download references

Funding

This research was funded in part by the National Natural Science Foundation of China (62171283), the National key research and development program of China (2022YFC2503305 and 2022YFC2503302), Shanghai Jiao Tong University fund (YG2023QNB27), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), the Fundamental Research Funds for Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manhua Liu.

Ethics declarations

Ethical approval

This study was approved by the Ethics Committee of the UNBC-McMaster Shoulder Pain Expression Archive Database and BioVid Heat Pain Database after signing and returning an agreement form available from https://sites.pitt.edu/\(\sim \)emotion/um-spread.htm and https://www.nit.ovgu.de/BioVid.html. All participants provided written informed consent before taking part in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 99 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Xu, H., Qiu, J. et al. Hierarchical global and local transformer for pain estimation with facial expression videos. Pattern Anal Applic 27, 85 (2024). https://doi.org/10.1007/s10044-024-01302-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10044-024-01302-y

Keywords

Navigation