Log in

Self-supervised facial expression recognition with fine-grained feature selection

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Facial expression recognition (FER) holds significant practical implications in real-world scenarios such as human–computer interaction, fatigue driving detection, and learning engagement analysis. Nonetheless, acquiring large-scale and high-quality annotated facial expression datasets is profoundly challenging due to the inherent ambiguity of facial images and concerns over privacy. Consequently, this paper introduces a self-supervised facial expression recognition method based on mask image modeling. This method can learn multi-level facial feature representations without expensive labels and achieves commendable facial expression recognition performance through further fine-grained feature selection. Specifically, we propose the multi-level feature selector (MFS). The MFS comprises two pivotal components: the multi-level feature combiner and the feature selector. During the pre-training stage, the multi-level feature combiner is employed to integrate multi-level features, effectively addressing the vision transformer’s deficiencies in capturing high-frequency facial semantics. Subsequently, in the fine-tuning stage, the feature selector can automatically differentiate highly discriminative regions, extracting fine-grained features. Subsequently, we use graph convolutional networks to further mine the latent connections among fine-grained features, ultimately deriving an integrated feature with enhanced discriminative capabilities. Through such fine-grained facial feature selection, we can mitigate performance degradation induced by inter-class similarities and intra-class variations. Experimental results on the RAF-DB, AffectNet, and FER + datasets demonstrate that our approach significantly outperforms other self-supervised methods in recognition performance and closely approaches the state-of-the-art methods in supervised learning. The code is available at https://github.com/Greysahy/MFS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets utilized in this study, namely RAF-DB, FERPlus, and AffectNet, can be accessed at the following respective URLs: RAF-DB: http://www.whdeng.cn/raf/model1.html. AffectNet: http://mohammadmahoor.com/affectnet. FER + : https://github.com/Microsoft/FERPlus

References

  1. Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020). https://doi.org/10.1109/tip.2019.2956143

    Article  ADS  Google Scholar 

  2. Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021). https://doi.org/10.1109/tip.2021.3093397

    Article  ADS  PubMed  Google Scholar 

  3. Zheng, C., Mendieta, M., Chen, C.: Poster: a pyramid cross-fusion transformer network for facial expression recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3146–3155 (2023) https://doi.org/10.1109/iccvw60793.2023.00339

  4. Mao, J., Xu, R., Yin, X., Chang, Y., Nie, B., Huang, A.: POSTER V2: a simpler and stronger facial expression recognition network. Preprint at ar**v:2301.12149. (2023) https://doi.org/10.48550/ar**v.2301.12149

  5. Shi, J., **u, Y., Tang, G.: Research on occlusion block face recognition based on feature point location. Comput. Anim. Virtual Worlds 33(3–4), e2094 (2022). https://doi.org/10.1002/cav.2094

    Article  Google Scholar 

  6. Li, H., Wang, N., Yang, X., Wang, X., Gao, X.: Towards semi-supervised deep facial expression recognition with an adaptive confidence margin. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4166–4175 (2022) https://doi.org/10.1109/cvpr52688.2022.00413

  7. He, K., Chen, X., **e, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022) https://doi.org/10.1109/cvpr52688.2022.01553

  8. **e, Z., Zhang, W., Sheng, B., Li, P., Chen, C.P.: BaGFN: broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Netw. Learn. Syst. (2021). https://doi.org/10.1109/TNNLS.2021.3116209

    Article  PubMed  Google Scholar 

  9. Ekman, P., Friesen, W.V.: Facial Action Coding Systems. Consulting Psychologists Press (1978)

    Google Scholar 

  10. Chen, J., Chen, Z., Chi, Z., Fu, H.: Facial expression recognition based on facial components detection and hog features. In: International Workshops on Electrical and Computer Engineering Subfields, pp. 884–888 (2014)

  11. Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017) https://doi.org/10.1109/cvpr.2017.277

  12. Cai, J., Meng, Z., Khan, A. S., Li, Z., O'Reilly, J., Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 302–309. IEEE (2018) https://doi.org/10.1109/fg.2018.00051

  13. Farzaneh, A. H., Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2402–2411 (2021) https://doi.org/10.1109/wacv48630.2021.00245

  14. Zhao, S., Cai, H., Liu, H., Zhang, J., Chen, S.: Feature selection mechanism in CNNs for facial expression recognition. In: BMVC, 12, pp. 317 (2018) https://doi.org/10.1109/ieeegcc.2009.5734265

  15. Hasani, B., Negi, P.S., Mahoor, M.H.: BReG-NeXt: facial affect computing using adaptive residual networks with bounded gradient. IEEE Trans. Affect. Comput. 13(2), 1023–1036 (2020). https://doi.org/10.1109/TAFFC.2020.2986440

    Article  Google Scholar 

  16. Li, Y., Zeng, J., Shan, S., Chen, X.: Patch-gated CNN for occlusion-aware facial expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2209–2214. IEEE (2018) https://doi.org/10.1109/ICPR.2018.8545853

  17. Wen, Z., Lin, W., Wang, T., Xu, G.: Distract your attention: multi-head cross attention network for facial expression recognition. Biomimetics 8(2), 199 (2023). https://doi.org/10.3390/biomimetics8020199

    Article  PubMed  PubMed Central  Google Scholar 

  18. Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, 35 (4), pp. 3510–3519 (2021) https://doi.org/10.1609/aaai.v35i4.16465

  19. Li, H., Wang, N., Yang, X., Wang, X., Gao, X.: Unconstrained facial expression recognition with no-reference de-elements learning. IEEE Trans. Affect. Comput. (2023). https://doi.org/10.1109/tip.2022.3186536

    Article  Google Scholar 

  20. Li, H., Wang, N., Yang, X., Gao, X.: CRS-CONT: a well-trained general encoder for facial expression analysis. IEEE Trans. Image Process. 31, 4637–4650 (2022). https://doi.org/10.1109/tip.2022.3186536

    Article  ADS  PubMed  Google Scholar 

  21. Li, H., Wang, N., Ding, X., Yang, X., Gao, X.: Adaptively learning facial expression representation via cf labels and distillation. IEEE Trans. Image Process. 30, 2016–2028 (2021). https://doi.org/10.1109/tip.2021.3049955

    Article  ADS  PubMed  Google Scholar 

  22. Roy, S., Etemad, A.: Self-supervised contrastive learning of multi-view facial expressions. In: Proceedings of the 2021 International Conference on Multimodal Interaction, pp. 253–257 (2021) https://doi.org/10.1145/3462244.3479955

  23. Shu, Y., Gu, X., Yang, G.-Z., Lo, B.: Revisiting self-supervised contrastive learning for facial expression recognition. Preprint at ar**v:2210.03853. (2022) https://doi.org/10.48550/ar**v.2210.03853

  24. Ma, B., An, R., Zhang, W., Ding, Y., Zhao, Z., Zhang, R., et al.: Facial action unit detection and intensity estimation from self-supervised representation. Preprint at ar**v:2210.15878. (2022) https://doi.org/10.48550/ar**v.2210.15878

  25. Cai, Z., Ghosh, S., Stefanov, K., Dhall, A., Cai, J., Rezatofighi, H., et al.: Marlin: masked autoencoder for facial video representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1493–1504 (2023) https://doi.org/10.1109/cvpr52729.2023.00150

  26. Sun, L., Lian, Z., Liu, B., Tao, J.: Mae-dfer: efficient masked autoencoder for self-supervised dynamic facial expression recognition. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 6110–6121 (2023) https://doi.org/10.48550/ar**v.2307.02227

  27. Esmaeili, V., Shahdi, S.O.: Automatic micro-expression apex spotting using Cubic-LBP. Multimedia Tools Appl. 79, 20221–20239 (2020). https://doi.org/10.1007/s11042-020-08737-5

    Article  Google Scholar 

  28. Esmaeili, V., Mohassel Feghhi, M., Shahdi, S.O.: Spotting micro-movements in image sequence by introducing intelligent cubic-LBP. IET Image Proc. 16(14), 3814–3830 (2022). https://doi.org/10.1049/ipr2.12596

    Article  Google Scholar 

  29. Happy, S., Routray, A.: Automatic facial expression recognition using features of salient facial patches. IEEE Trans. Affect. Comput. 6(1), 1–12 (2014). https://doi.org/10.1109/TAFFC.2014.2386334

    Article  Google Scholar 

  30. Marrero Fernandez, P. D., Guerrero Pena, F. A., Ren, T., Cunha, A.: Feratt: facial expression recognition with attention net. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019) https://doi.org/10.1109/cvprw.2019.00112

  31. Li, H., Wang, N., Yu, Y., Yang, X., Gao, X.: LBAN-IL: a novel method of high discriminative representation for facial expression recognition. Neurocomputing 432, 159–169 (2021). https://doi.org/10.1016/j.neucom.2020.12.076

    Article  Google Scholar 

  32. Park, N., Kim, S.: How do vision transformers work?. Preprint at ar**v:2202.06709 (2022) https://doi.org/10.48550/ar**v.2202.06709

  33. Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017). https://doi.org/10.1109/TAFFC.2017.2740923

    Article  Google Scholar 

  34. Barsoum, E., Zhang, C., Ferrer, C. C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016) https://doi.org/10.1145/2993148.2993165

  35. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

  36. He, K., Fan, H., Wu, Y., **e, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738. (2020) https://doi.org/10.1109/cvpr42600.2020.00975

  37. Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. Preprint at ar**v:2003.04297. (2020) https://doi.org/10.48550/ar**v.2003.04297

  38. Chen, X., He, K.: Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021) https://doi.org/10.1109/cvpr46437.2021.01549

  39. Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural. Inf. Process. Syst. 33, 21271–21284 (2020)

    Google Scholar 

  40. Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2018). https://doi.org/10.1109/TIP.2018.2886767

    Article  ADS  MathSciNet  Google Scholar 

  41. Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020) https://doi.org/10.1109/cvpr42600.2020.00693

  42. Li, H., **ao, X., Liu, X., Guo, J., Wen, G., Liang, P.: Heuristic objective for facial expression recognition. Vis. Comput. 39(10), 4709–4720 (2023). https://doi.org/10.1007/s00371-022-02619-7

    Article  Google Scholar 

  43. Zeng, D., Lin, Z., Yan, X., Liu, Y., Wang, F., Tang, B.: Face2exp: combating data biases for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20291–20300 (2022) https://doi.org/10.1109/cvpr52688.2022.01965

  44. Xue, F., Wang, Q., Tan, Z., Ma, Z., Guo, G.: Vision transformer with attentive pooling for robust facial expression recognition. IEEE Trans. Affect. Comput. (2022). https://doi.org/10.1109/TAFFC.2022.3226473

    Article  Google Scholar 

  45. **a, H., Lu, L., Song, S.: Feature fusion of multi-granularity and multi-scale for facial expression recognition. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02900-3

    Article  Google Scholar 

  46. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017) https://doi.org/10.1109/iccv.2017.74

Download references

Funding

This work was supported by the Humanities and Social Science Fund of the Ministry of Education of the People’s Republic of China (22YJAZH036).

Author information

Authors and Affiliations

Authors

Contributions

Heng-Yu An contributed to investigation, conceptualization, methodology, software, and writing—original draft; Rui-Sheng Jia contributed to supervision, methodology, writing—review & editing, and funding acquisition.

Corresponding author

Correspondence to Rui-Sheng Jia.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

An, HY., Jia, RS. Self-supervised facial expression recognition with fine-grained feature selection. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03322-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03322-5

Keywords

Navigation