Abstract
Facial Expression Recognition (FER) is at the heart of Human–Computer Interaction (HCI) and has received a lot of attention in the field of computer vision. We present a novel attention-based deep neural network for recognizing facial expressions from images. Initially, regions such as eye-pair, mouth and face are cropped, independently passed through the pre-trained Xception network to obtain deep representations. All of these descriptors may not have same influence while recognizing the type of expression, and some of them may require special attention over others depending on the type of expression. We incorporate attention mechanism into the model to automatically learn the amount of attention to be paid to each descriptor. These attention-based features obtained from all the three regions are then fused using the proposed Cross Average Pooling (CAP) layers to produce a compact and discriminatory representation that ultimately leads to better identification of facial expressions. The proposed cross average pooled soft attention results in compact and discriminatory representations for facial images, allowing for more accurate predictions. The proposed approach is evaluated on two benchmark datasets (JAFFE and CK+), and the experimental results reveal that the proposed model outperforms existing models with an accuracy of 97.67 and 97.46% on JAFFE and CK+ datasets, respectively.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40031-022-00746-2/MediaObjects/40031_2022_746_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40031-022-00746-2/MediaObjects/40031_2022_746_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40031-022-00746-2/MediaObjects/40031_2022_746_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs40031-022-00746-2/MediaObjects/40031_2022_746_Fig4_HTML.png)
Similar content being viewed by others
References
Y. Wang, Y. Li, Y. Song, X. Rong, The influence of the activation function in a convolution neural network model of facial expression recognition. Appl. Sci. 10(5), 1897 (2020)
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1 (IEEE, 2005), pp. 886–893
R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 580–587
J.D. Bodapati, N.S. Shaik, V. Naralasetti, Deep convolution feature aggregation: an application to diabetic retinopathy severity level prediction, in Signal, Image and Video Processing (2021), pp. 1–8
J.D. Bodapati, N. Veeranjaneyulu, Facial emotion recognition using deep CNN based features. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 8(7), 1928–1931 (2019)
S. **e, H. Hu, Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Trans. Multimedia 21(1), 211–220 (2018)
S. Singh, F. Nasoz, Facial expression recognition with convolutional neural networks, in 10th Annual Computing and Communication Workshop and Conference (CCWC) (IEEE, 2020), pp. 0324–0328
A. Agrawal, N. Mittal, Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis. Comput. 36(2), 405–412 (2020)
P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1 (IEEE, 2001), p. I
Y. Sun, X. Wang, X. Tang, Deep convolutional network cascade for facial point detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 3476–3483
A. Mollahosseini, D. Chan, M.H. Mahoor, Going deeper in facial expression recognition using deep neural networks, in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2016), pp. 1–10
Z. Yu, C. Zhang, Image based static facial expression recognition with multiple deep network learning, in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. (2015), pp. 435–442
D.A. Pitaloka, A. Wulandari, T. Basaruddin, D.Y. Liliana, Enhancing CNN with preprocessing stage in automatic emotion recognition. Procedia Comput. Sci. 116, 523–529 (2017)
T. Hassner et al., Effective face frontalization in unconstrained images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 4295–4304
P. Hu et al., Learning supervised scoring ensemble for emotion recognition in the wild, in Proceedings of the 19th ACM International Conference on Multimodal Interaction (2017) pp. 553–560
V. Gupta, M. Mittal, R-peak detection for improved analysis in health informatics. Int. J. Med. Eng. Inf. 13(3), 213–223 (2021)
S.L. Happy, A. Routray, Automatic facial expression recognition using features of salient facial patches. IEEE Trans. Affect. Comput. 6(1), 1–12 (2014)
J.D. Bodapati, N. Veeranjaneyulu, Abnormal network traffic detection using support vector data description, in Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications (Springer, 2017), pp. 497–506
P. Carcagnì et al., Facial expression recognition and histograms of oriented gradients: a comprehensive study. Springerplus 4(1), 645 (2015)
M. Dahmane, J. Meunier. Emotion recognition using dynamic grid-based HoG features, in Face and Gesture 2011 (IEEE, 2011), pp. 884–888
T. Zhang et al., A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimedia 1812, 2528–2536 (2016)
G. Wenfei et al., Facial expression recognition using radial encoding of local Gabor features and classifier synthesis. Pattern Recogn. 45(1), 80–91 (2012)
M.S. Zia, M.A. Jaffar, An adaptive training based on classification system for patterns in facial expressions using SURF descriptor templates. Multimedia Tools Appl. 74(11), 3881–3899 (2015)
C. Shan, S. Gong, P.W. McOwan, Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Y. Luo, W. Cai-Ming, Y. Zhang, Facial expression recognition based on fusion feature of PCA and LBP with SVM. Opt.-Int. J. Light Electron Opt. 124(17), 2767–2770 (2013)
F. Cheng, Y. Jiangsheng, H. **ong, Facial expression recognition in JAFFE dataset based on Gaussian process classification. IEEE Trans. Neural Netw. 21(10), 1685–1690 (2010)
V. Gupta, M. Mittal, V. Mittal, R-peak detection using chaos analysis in standard and real time ECG databases. IRBM 40(6), 341–354 (2019)
J.D. Bodapati, U. Srilakshmi, N. Veeranjaneyulu. FERNet: a deep CNN architecture for facial expression recognition in the wild, in Journal of The institution of engineers (India): series B (2021), pp. 1–10
P. Burkert et al. Dexpression: deep convolutional neural network for expression recognition. ar**v preprint ar**v:1509.05371 (2015)
D. Hamester, P. Barros, S. Wermter, Face expression recognition with a 2-channel convolutional neural network, in 2015 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2015), pp. 1–8
P. Liu et al. Facial expression recognition via a boosted deep belief network, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1805–1812
M. Liu et al., Au-inspired deep networks for facial expression feature learning. Neurocomputing 159, 126–136 (2015)
M. Liu et al. Au-aware deep networks for facial expression recognition, in 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) (IEEE, 2013), pp. 1–6
P. Khorrami, T. Paine, T. Huang, Do deep neural networks learn facial action units when doing expression recognition? in Proceedings of the IEEE International Conference on Computer Vision Workshops (2015), pp. 19–27
B. Yang et al., Facial expression recognition using weighted mixture deep neural network based on doublechannel facial images. IEEE Access 6, 4630–4640 (2017)
G. Wen et al., Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cogn. Comput. 9(5), 597–610 (2017)
A.T. Lopes et al., Facial expression recognition with convolutional neural networks: co** with few data and the training sample order. Pattern Recogn. 61, 610–628 (2017)
I. Goodfellow et al. Generative adversarial nets, in Advances in Neural Information Processing Systems (2014), pp. 2672–2680
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556 (2014)
K. He et al. Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778
F. Chollet. Xception: deep learning with depthwise separable convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 1251–1258
C. Szegedy et al., Inception-v4, inception-resnet and the impact of residual connections on learning, in 31st AAAI Conference on Artificial Intelligence (2017)
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
H.-W. Ng et al., Deep learning for emotion recognition on small datasets using transfer learning, in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (2015), pp. 443–449
V. Kazemi, J. Sullivan, One millisecond face alignment with an ensemble of regression trees, in 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1867–1874
J. Michael, M.K. Lyons, J. Gyoba, Japanese female facial expressions (JAFFE), in Database of Digital Images (1997)
P. Lucey et al., The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotionspecified expression, in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2010), pp. 94–101
T. Kanade, J.F. Cohn, Y. Tian, Comprehensive database for facial expression analysis, in Proceedings 4th IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580) (IEEE, 2000), pp. 46–53
Funding
No Funding
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bodapati, J.D., Naik, D.S.B., Suvarna, B. et al. A Deep Learning Framework with Cross Pooled Soft Attention for Facial Expression Recognition. J. Inst. Eng. India Ser. B 103, 1395–1405 (2022). https://doi.org/10.1007/s40031-022-00746-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40031-022-00746-2