Abstract
Automatic pain intensity estimation from facial expression analysis has important applications in medical and healthcare areas. Most of the existing works tend to directly transfer the typical face recognition models to pain estimation task, which may not obtain good performances because the facial expression of pain is spontaneous with subtle facial variations. Pain estimation from facial video is still challenging because it relies on modeling semantic parts and extraction of fine-grained and dynamic features. In this study, we propose a hierarchical global and local transformer (HGLT) model for pain estimation from facial expression videos. HGLT model consists of an image frame embedding subnetwork and a temporal embedding subnetwork for extraction of spatio-temporal features. In the frame embedding subnetwork, we propose a multi-head local attention mechanism to extract the local fine-grained features related to micro variations of pain, followed by a hierarchical self-attention pooling to integrate the global and local features. In the temporal embedding subnetwork, a transformer encoder with temporal attention is proposed to model the temporal relationships of video frames and capture the dynamic facial variations. A correlation loss is proposed to alleviate the problem of long-tailed imbalance in the distribution of pain intensities. Our proposed method is tested with UNBC-McMaster Shoulder Pain, BioVid Heart Pain dataset, and DAiSEE dataset. Experimental results indicate that our method achieves competitive performances compared with the state-of-the-art methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-024-01302-y/MediaObjects/10044_2024_1302_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-024-01302-y/MediaObjects/10044_2024_1302_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-024-01302-y/MediaObjects/10044_2024_1302_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-024-01302-y/MediaObjects/10044_2024_1302_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-024-01302-y/MediaObjects/10044_2024_1302_Fig5_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-024-01302-y/MediaObjects/10044_2024_1302_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-024-01302-y/MediaObjects/10044_2024_1302_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-024-01302-y/MediaObjects/10044_2024_1302_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10044-024-01302-y/MediaObjects/10044_2024_1302_Fig9_HTML.jpg)
Data availability
The datasets generated and/or analyzed during the current study are available in the https://sites.pitt.edu/\(\sim \)emotion/um-spread.htm, https://www.nit.ovgu.de/BioVid.html, and https://people.iith.ac.in/vineethnb/resou rces/daisee/index.html.
References
Morone NE, Weiner DK (2013) Pain as the fifth vital sign: exposing the vital need for pain education. Clin Ther 35(11):1728–1732
Dehghani H, Tavangar H, Ghandehari A (2014) Validity and reliability of behavioral pain scale in patients with low level of consciousness due to head trauma hospitalized in intensive care unit. Archives of trauma research 3(1)
Achterberg WP, Pieper MJ, Dalen-Kok AH, De Waal MW, Husebo BS, Lautenbacher S, Kunz M, Scherder EJ, Corbett A (2013) Pain management in patients with dementia. Clin Interv Aging 8:1471
Werner P, Al-Hamadi A, Niese R, Walter S, Gruss S, Traue HC (2014) Automatic pain recognition from video and biomedical signals. In: 2014 22nd International Conference on Pattern Recognition, pp. 4582–4587. IEEE
Bunk SF, Lautenbacher S, Rüsseler J, Müller K, Schultz J, Kunz M (2018) Does eeg activity during painful stimulation mirror more closely the noxious stimulus intensity or the subjective pain sensation? Somatosens Motor Res 35(3–4):192–198
Nickel MM, May ES, Tiemann L, Schmidt P, Postorino M, Dinh ST, Gross J, Ploner M (2017) Brain oscillations differentially encode noxious stimulus intensity and pain intensity. Neuroimage 148:141–147
Sheu E, Versloot J, Nader R, Kerr D, Craig KD (2011) Pain in the elderly: validity of facial expression components of observational measures. Clin J Pain 27(7):593–601
Ekman P, Friesen WV (1978) Facial action coding system. Environmental Psychology & Nonverbal Behavior
Kunz M, Meixner D, Lautenbacher S (2019) Facial muscle movements encoding pain-a systematic review. Pain 160(3):535–549
Prkachin KM, Solomon PE (2008) The structure, reliability and validity of pain expression: evidence from patients with shoulder pain. Pain 139(2):267–274
Lucey P, Cohn JF, Prkachin KM, Solomon PE, Matthews I (2011) Painful data: The unbc-mcmaster shoulder pain expression archive database. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 57–64. IEEE
Wang J, Sun H (2018) Pain intensity estimation using deep spatiotemporal and handcrafted features. IEICE Trans Inf Syst 101(6):1572–1580
Zafar Z, Khan NA (2014) Pain intensity evaluation through facial action units. In: 2014 22nd International Conference on Pattern Recognition, pp. 4696–4701. IEEE
Meng H, Bianchi-Berthouze N (2013) Affective state level recognition in naturalistic facial and vocal expressions. IEEE Trans Cybern 44(3):315–328
Zhao R, Gan Q, Wang S, Ji Q (2016) Facial expression intensity estimation using ordinal information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3466–3474
Li H, Wang N, Yang X, Gao X (2022) Crs-cont: a well-trained general encoder for facial expression analysis. IEEE Trans Image Process 31:4637–4650
Li H, Wang N, Yang X, Wang X, Gao X (2022) Towards semi-supervised deep facial expression recognition with an adaptive confidence margin. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4166–4175
Li H, Wang N, Ding X, Yang X, Gao X (2021) Adaptively learning facial expression representation via cf labels and distillation. IEEE Trans Image Process 30:2016–2028
Rodriguez P, Cucurull G, Gonzàlez J, Gonfaus JM, Nasrollahi K, Moeslund TB, Roca FX (2017) Deep pain: exploiting long short-term memory networks for facial expression classification. IEEE transactions on cybernetics
Tavakolian M, Hadid A (2018) Deep spatiotemporal representation of the face for automatic pain intensity estimation. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 350–354. IEEE
Huang D, **a Z, Li L, Ma Y (2023) Pain estimation with integrating global-wise and region-wise convolutional networks. IET Image Proc 17(3):637–648
Martinez DL, Rudovic O, Picard R (2017) Personalized automatic estimation of self-reported pain intensity from facial expressions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2318–2327. IEEE
Tavakolian M, Hadid A (2019) A spatiotemporal convolutional neural network for automatic pain intensity estimation from facial dynamics. Int J Comput Vision 127(10):1413–1425
Huang D, **a Z, Mwesigye J, Feng X (2020) Pain-attentive network: a deep spatio-temporal attention model for pain estimation. Multimed Tools Appl 79(37):28329–28354
Huang D, Feng X, Zhang H, Yu Z, Peng J, Zhao G, **a Z (2021) Spatio-temporal pain estimation network with measuring pseudo heart rate gain. IEEE Trans Multimed 24:3300–3313
Zhao Z, Liu Q (2021) Former-dfer: dynamic facial expression recognition transformer. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1553–1561
Liao J, Hao Y, Zhou Z, Pan J, Liang Y (2024) Sequence-level affective level estimation based on pyramidal facial expression features. Pattern Recogn 145:109958
Praveen RG, Granger E, Cardinal P (2020) Deep weakly supervised domain adaptation for pain localization in videos. In: 2020 15th IEEE International conference on automatic face and gesture recognition (FG 2020), pp. 473–480. IEEE
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141
Wang Q, Wu T, Zheng H, Guo G (2020) Hierarchical pyramid diverse attention networks for face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8326–8335
Walter S, Gruss S, Ehleiter H, Tan J, Traue HC, Werner P, Al-Hamadi A, Crawcour S, Andrade AO, Silva GM (2013) The biovid heat pain database data for the advancement and systematic validation of an automated pain recognition system. In: 2013 IEEE international conference on cybernetics (CYBCO), pp. 128–131. IEEE
Gupta A, D’Cunha A, Awasthi K, Balasubramanian V (2016) Daisee: towards user engagement recognition in the wild. ar**v preprint ar**v:1609.01885
Mehta NK, Prasad SS, Saurav S, Saini R, Singh S (2022) Three-dimensional densenet self-attention neural network for automatic detection of student’s engagement. Appl Intell 52(12):13803–13823
Cootes TF, Edwards GJ, Taylor CJ (1998) Active appearance models. In: European conference on computer vision, pp. 484–498. Springer
Cubuk ED, Zoph B, Shlens J, Le QV (2020) Randaugment: Practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops, pp. 702–703
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. Proceedings of the AAAI conference on artificial intelligence 34:13001–13008
Jiang X, Zong Y, Zheng W, Tang C, **a W, Lu C, Liu J (2020) Dfew: A large-scale database for recognizing dynamic facial expressions in the wild. In: Proceedings of the 28th ACM international conference on multimedia, pp. 2881–2889
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. ar**v preprint ar**v:1711.05101
Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE
Wang Y, Bilinski P, Bremond F, Dantcheva A (2020) Imaginator: Conditional spatio-temporal gan for video generation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1160–1169
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, 618–626
Wang F, **ang X, Liu C, Tran TD, Reiter A, Hager GD, Quon H, Cheng J, Yuille AL (2017) Regularizing face verification nets for pain intensity regression. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 1087–1091. IEEE
Huang Y, Qing L, Xu S, Wang L, Peng Y (2022) Hybnet: a hybrid network structure for pain intensity estimation. The Visual Computer, 1–12
Melo WC, Granger E, Lopez MB (2024) Facial expression analysis using decomposed multiscale spatiotemporal networks. Expert Syst Appl 236:121276
Rajasekhar GP, Granger E, Cardinal P (2021) Deep domain adaptation with ordinal regression for pain assessment using weakly-labeled videos. Image Vis Comput 110:104167
Liao J, Liang Y, Pan J (2021) Deep facial spatiotemporal network for engagement prediction in online learning. Appl Intell 51(10):6609–6621
Funding
This research was funded in part by the National Natural Science Foundation of China (62171283), the National key research and development program of China (2022YFC2503305 and 2022YFC2503302), Shanghai Jiao Tong University fund (YG2023QNB27), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), the Fundamental Research Funds for Central Universities.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This study was approved by the Ethics Committee of the UNBC-McMaster Shoulder Pain Expression Archive Database and BioVid Heat Pain Database after signing and returning an agreement form available from https://sites.pitt.edu/\(\sim \)emotion/um-spread.htm and https://www.nit.ovgu.de/BioVid.html. All participants provided written informed consent before taking part in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, H., Xu, H., Qiu, J. et al. Hierarchical global and local transformer for pain estimation with facial expression videos. Pattern Anal Applic 27, 85 (2024). https://doi.org/10.1007/s10044-024-01302-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01302-y