Abstract
In the field of facial expression recognition (FER), two main trends point to the data-driven FER and feature-driven FER exist. The former focused on the data problems (e.g., sample imbalance and multimodal fusion), while the latter explored the facial expression features. As the feature-driven FER is more important than the data-driven FER, for deeper mining of facial features, we propose an expression recognition model based on Local–Global information Reasoning and Landmark Spatial Distributions. Particularly to reason local–global information, multiple attention mechanisms with the modified residual module are designed for the Res18-LG module. In addition, taking the spatial topology of facial landmarks into account, a topological relationship graph of landmarks and a two-layer graph neural network are introduced to extract spatial distribution features. Finally, the experiment results on FERPlus and RAF-DB datasets demonstrate that our model outperforms the state-of-the-art methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00371-024-03345-y/MediaObjects/371_2024_3345_Fig12_HTML.png)
Similar content being viewed by others
Data availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
References
Chattopadhyay, J., Kundu, S., Chakraborty, A., Banerjee, J.S.: Facial expression recognition for human computer interaction. In: International Conference on Computational Vision and Bio Inspired Computing, pp. 1181–1192. Springer (2018)
Wu, S., Wang, B.: Facial expression recognition based on computer deep learning algorithm: taking cognitive acceptance of college students as an example. J. Ambient Intell. Hum. Comput. 13, 1–12 (2021)
Wolf, K.: Measuring facial expression of emotion. Dialogues Clin. Neurosci. 17(4), 457 (2022)
Ye, J., Yu, Y., Fu, G., Zheng, Y., Liu, Y., Zhu, Y., Wang, Q.: Analysis and recognition of voluntary facial expression mimicry based on depressed patients. IEEE J. Biomed. Health Inform. 27(8), 3698 (2023)
Kollias, D.: Multi-label compound expression recognition: C-expr database & network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5589–5598 (2023)
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. 13, 1195 (2020)
Huiqun, H., Gui**, S., Fenghua, H.: Summary of expression recognition technology. J. Front. Comput. Sci. Technol. 16(8), 1764 (2022)
Huang, Y., Du, C., Xue, Z., Chen, X., Zhao, H., Huang, L.: What makes multi-modal learning better than single (provably). Adv. Neural Inf. Process. Syst. 34, 10944–10956 (2021)
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020)
Zhang, Y., Wang, C., Deng, W.: Relative uncertainty learning for facial expression recognition. Adv. Neural Inf. Process. Syst. 34, 17616–17627 (2021)
She, J., Hu, Y., Shi, H., Wang, J., Shen, Q., Mei, T.: Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6248–6257 (2021)
Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519 (2021)
**, Y., Mao, Q., Zhou, L.: Weighted contrastive learning using pseudo labels for facial expression recognition. Vis. Comput. 39(10), 5001–5012 (2023)
Farzaneh, A.H., Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer vision, pp. 2402–2411 (2021)
Saurav, S., Gidde, P., Saini, R., Singh, S.: Dual integrated convolutional neural network for real-time facial expression recognition in the wild. Vis. Comput. 38, 1–14 (2022)
Li, J., **, K., Zhou, D., Kubota, N., Ju, Z.: Attention mechanism-based CNN for facial expression recognition. Neurocomputing 411, 340–350 (2020)
Hu, M., Ge, P., Wang, X., Lin, H., Ren, F.: A spatio-temporal integrated model based on local and global features for video expression recognition. Vis. Comput. 38, 1–18 (2021)
Yao, L., He, S., Su, K., Shao, Q.: Facial expression recognition based on spatial and channel attention mechanisms. Wirel. Pers. Commun. 125, 1–18 (2022)
Yu, M., Zheng, H., Peng, Z., Dong, J., Du, H.: Facial expression recognition based on a multi-task global-local network. Pattern Recognit. Lett. 131, 166–171 (2020)
Zhang, H., Su, W., Wang, Z.: Weakly supervised local–global attention network for facial expression recognition. IEEE Access 8, 37976–37987 (2020)
Kim, S., Nam, J., Ko, B.C.: Facial expression recognition based on squeeze vision transformer. Sensors 22(10), 3729 (2022)
Xue, F., Wang, Q., Guo, G.: Transfer: learning relation-aware facial expression representations with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3601–3610 (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. ar**v Preprint ar**v:2010.11929 (2020)
Wu, H., **ao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: Cvt: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22–31 (2021)
Li, H., **ao, X., Liu, X., Guo, J., Wen, G., Liang, P.: Heuristic objective for facial expression recognition. Vis. Comput. 39(10), 4709–4720 (2023)
Wen, Z., Lin, W., Wang, T., Xu, G.: Distract your attention: multi-head cross attention network for facial expression recognition. Biomimetics 8(2), 199 (2023)
Gong, W., Fan, Y., Qian, Y.: Effective attention feature reconstruction loss for facial expression recognition in the wild. Neural Comput. Appl. 34(12), 10175–10187 (2022)
**a, H., Lu, L., Song, S.: Feature fusion of multi-granularity and multi-scale for facial expression recognition. Vis. Comput. 40, 1–13 (2023)
Liang, X., Xu, L., Zhang, W., Zhang, Y., Liu, J., Liu, Z.: A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition. Vis. Comput. 39(6), 2277–2290 (2023)
Ma, F., Sun, B., Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. 14, 1236 (2021)
Zheng, C., Mendieta, M., Chen, C.: Poster: a pyramid cross-fusion transformer network for facial expression recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3146–3155 (2023)
Wang, X., Wang, Y., Li, W., Du, Z., Huang, D.: Facial expression animation by landmark guided residual module. IEEE Trans. Affect. Comput. 14, 878 (2021)
Ayeche, F., Alti, A.: Facial expressions recognition based on delaunay triangulation of landmark and machine learning. Traitement Signal 38(6), 1575 (2021)
Hasani, B., Mahoor, M.H.: Facial expression recognition using enhanced deep 3d convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 30–40 (2017)
Wang, Z., Zeng, F., Liu, S., Zeng, B.: OAENet: oriented attention ensemble for accurate facial expression recognition. Pattern Recognit. 112, 107694 (2021)
Kaya, M., Bilge, H.Ş: Deep metric learning: a survey. Symmetry 11(9), 1066 (2019)
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)
Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016)
Sebe, N., Cohen, I., Gevers, T., Huang, T.S.: Multimodal approaches for emotion recognition: a survey. In: Internet Imaging VI, vol. 5670, pp. 56–67. SPIE (2005)
Mittal, T., Guhan, P., Bhattacharya, U., Chandra, R., Bera, A., Manocha, D.: Emoticon: context-aware multimodal emotion recognition using Frege’s principle. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14234–14243 (2020)
Sun, B., Cao, S., He, J., Yu, L.: Affect recognition from facial movements and body gestures by hierarchical deep spatio-temporal features and fusion strategy. Neural Netw. 105, 36–51 (2018)
Shi, J., Liu, C., Ishi, C.T., Ishiguro, H.: Skeleton-based emotion recognition based on two-stream self-attention enhanced spatial-temporal graph convolutional network. Sensors 21(1), 205 (2020)
Huang, Y., Wen, H., Qing, L., **, R., **ao, L.: Emotion recognition based on body and context fusion in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3609–3617 (2021)
Chen, J., Wang, C., Wang, K., Yin, C., Zhao, C., Xu, T., Zhang, X., Huang, Z., Liu, M., Yang, T.: HEU emotion: a large-scale database for multimodal emotion recognition in the wild. Neural Comput. Appl. 33(14), 8669–8685 (2021)
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: Meld: a multimodal multi-party dataset for emotion recognition in conversations. ar**v Preprint ar**v:1810.02508 (2018)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Huang, Q., Huang, C., Wang, X., Jiang, F.: Facial expression recognition with grid-wise attention and visual transformer. Inf. Sci. 580, 35–54 (2021)
Pecoraro, R., Basile, V., Bono, V.: Local multi-head channel self-attention for facial expression recognition. Information 13(9), 419 (2022)
Liu, C., Hirota, K., Dai, Y.: Patch attention convolutional vision transformer for facial expression recognition with occlusion. Inf. Sci. 619, 781–794 (2023)
Xue, F., Wang, Q., Tan, Z., Ma, Z., Guo, G.: Vision transformer with attentive pooling for robust facial expression recognition. IEEE Trans. Affect. Comput. 14, 3244–3256 (2022)
Liu, Y., Zhang, X., Li, Y., Zhou, J., Li, X., Zhao, G.: Graph-based facial affect analysis: a review. IEEE Trans. Affect. Comput. 14, 2657–2677 (2022)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
Welling, M., Kipf, T.N.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR 2017) (2016)
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. Stat 1050, 20 (2017)
Brody, S., Alon, U., Yahav, E.: How attentive are graph attention networks? In: International Conference on Learning Representations (2021)
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Li, H., Sui, M., Zhao, F., Zha, Z., Wu, F.: MVT: mask vision transformer for facial expression recognition in the wild. ar**v Preprint ar**v:2106.04520 (2021)
Ruan, D., Yan, Y., Lai, S., Chai, Z., Shen, C., Wang, H.: Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7660–7669 (2021)
Yu, W., Xu, H.: Co-attentive multi-task convolutional neural network for facial expression recognition. Pattern Recognit. 123, 108401 (2022)
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-celeb-1M: A dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision, pp. 87–102. Springer (2016)
Lo, L., **e, H., Shuai, H.H., Cheng, W.H.: Facial chirality: from visual self-reflection to robust facial feature learning. IEEE Trans. Multimed. 24, 4275–4284 (2022)
Wasi, A.T., Šerbetar, K., Islam, R., Rafi, T.H., Chae, D.K.: Arbex: Attentive feature extraction with reliability balancing for robust facial expression learning. ar**v preprint ar**v:2305.01486 (2023)
Ngwe, J.L., Lim, K.M., Lee, C.P., Ong, T.S.: PAtt-Lite: lightweight patch and attention MobileNet for challenging facial expression recognition. ar**v preprint ar**v:2306.09626 (2023)
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010)
Shi, J., Zhu, S., Liang, Z.: Learning to amend facial expression representation via de-Albino and affinity. ar**v Preprint ar**v:2103.10189 (2021)
Acknowledgements
This work was supported by the Sichuan Science and Technology Program under Grant 2023YFS0195.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence their work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
**ong, K., Qing, L., Li, L. et al. Facial expression recognition based on local–global information reasoning and spatial distribution of landmark features. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03345-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s00371-024-03345-y