Log in

A two-branch hand gesture recognition approach combining atrous convolution and attention mechanism

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Hand gesture recognition is an important research field in computer vision. To effectively solve the problem of low hand gesture recognition accuracy, we propose two modules by using atrous convolution in this paper. One is Multi-Scale Fusion (MSF) module. The other is Light-Weight Multi-Scale (LWMS) module. The MSF module can be used for extracting multi-scale features at different receptive fields. The LWMS module can be considered as a kind of enhanced and expanded convolutional operation. Based on the two modules, a Hand Gesture Recognition Approach called HGRA is designed. HGRA is a hand gesture recognition approach which is based on an end-to-end CNN-based framework with two branches. One branch uses the U-Net combined with Multi-Scale Attention module to perform hand gesture segmentation in order to separate hand gestures from complex backgrounds. Then the segmentation result is used for extracting shape features. The other branch extracts visual features, such as appearance and color. The shape and the visual features obtained by the two branches are integrated to perform hand gesture recognition. Experimental results on the OUHANDS and HGR1 gesture datasets show that the proposed method has competitive performance both in hand gesture segmentation and recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Rastgoo, R., Kiani, K., Escalera, S.: Sign language recognition: a deep survey. Expert Syst. Appl. (2021). https://doi.org/10.1016/j.eswa.2020.113794

    Article  Google Scholar 

  2. Cheok, M.J., Omar, Z., Jaward, M.H.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10(1), 131–153 (2019). https://doi.org/10.1007/s13042-017-0705-5

    Article  Google Scholar 

  3. Matilainen, M., Sangi, P., Holappa, J., and Silvén, O.: OUHANDS database for hand detection and pose recognition. in 2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA): IEEE, pp. 1–5 (2016). https://doi.org/10.1109/IPTA.2016.7821025

  4. HGR1. http://sun.aei.polsl.pl/mkawulok/gestures/

  5. Zhao, S., Yang, W., and Wang, Y.: A new hand segmentation method based on fully convolutional network. in 2018 Chinese Control And Decision Conference (CCDC): IEEE, pp. 5966–5970 (2018). https://doi.org/10.1109/CCDC.2018.8408176

  6. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention: Springer, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  7. Chen, L.-C., Papandreou, G., Schroff, F., and Adam, H.: Rethinking atrous convolution for semantic image segmentation. ar**v preprint https://arxiv.org/abs/1706.05587, (2017)

  8. Xu, K., Chen, M., Xu, Y., and Li, X.: A Gesture Segmentation Method Based on Domain Adaptation and Channel Attention Mechanism. in 2021 International Conference on Communications, Information System and Computer Engineering (CISCE): IEEE, pp. 447–452 (2021). https://doi.org/10.1109/CISCE52179.2021.9445897

  9. Chen, M., Xu, K., and Li, X.: A Hand Gesture Segmentation Method Based on Style Transfer. Computer and Modernization, no. 05, p. 20 (2021)

  10. Wang, X., Girshick, R., Gupta, A., and He, K.: Non-local neural networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803 (2018)

  11. Hu, J., Shen, L., and Sun, G.: Squeeze-and-excitation networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745

  12. Ong, E.-J. and Bowden, R.: A boosted classifier tree for hand shape detection. in Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings.: IEEE, pp. 889–894 (2004). https://doi.org/10.1109/AFGR.2004.1301646

  13. Mittal, A., Zisserman, A., and Torr, P. H.: Hand detection using multiple proposals. in Bmvc, vol. 2, no. 3, p. 5 (2011). https://doi.org/10.5244/C.25.75

  14. Alani, A. A., Cosma, G., Taherkhani, A., and McGinnity, T.: Hand gesture recognition using an adapted convolutional neural network with data augmentation. in 2018 4th International conference on information management (ICIM): IEEE, pp. 5–12 (2018). https://doi.org/10.1109/INFOMAN.2018.8392660

  15. Islam, M. Z., Hossain, M. S., ul Islam, R., and Andersson, K.: Static hand gesture recognition using convolutional neural network with data augmentation. in 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR): IEEE, pp. 324–329 (2019). https://doi.org/10.1109/ICIEV.2019.8858563

  16. Molchanov, P., Gupta, S., Kim, K., and Kautz, J.: Hand gesture recognition with 3D convolutional neural networks. in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1–7 (2015). https://doi.org/10.1109/CVPRW.2015.7301342

  17. Dadashzadeh, A., Targhi, A.T., Tahmasbi, M., Mirmehdi, M.: HGR-Net: a fusion network for hand gesture segmentation and recognition. IET Comput. Vis. 13(8), 700–707 (2019). https://doi.org/10.1049/iet-cvi.2018.5796

    Article  Google Scholar 

  18. Zhu, X., Liu, W., Jia, X., and Wong, K.-Y. K.: A two-stage detector for hand detection in ego-centric videos. in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV): IEEE, pp. 1–8 (2016). https://doi.org/10.1109/WACV.2016.7477665

  19. Ren, S., He, K., Girshick, R., and Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. ar**v preprint https://arxiv.org/abs/1506.01497, (2015). https://doi.org/10.1109/TPAMI.2016.2577031

  20. He, K., Gkioxari, G., Dollár, P., and Girshick, R.: Mask r-cnn. in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017). https://doi.org/10.1109/ICCV.2017.322

  21. Bhaumik, G., Verma, M., Govil, M.C., Vipparthi, S.K.: ExtriDeNet: an intensive feature extrication deep network for hand gesture recognition. Vis. Comput. 5, 1–14 (2021). https://doi.org/10.1007/s00371-021-02225-z

    Article  Google Scholar 

  22. Schroff, F., Kalenichenko, D., and Philbin, J.: Facenet: A unified embedding for face recognition and clustering. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823 (2015)

  23. Kawulok, M., Kawulok, J., Nalepa, J., and Papiez, M.: Skin detection using spatial analysis with adaptive seed. in 2013 IEEE International Conference on Image Processing: IEEE, pp. 3720–3724 (2013). https://doi.org/10.1109/ICIP.2013.6738767

  24. Long, J., Shelhamer, E., and Darrell, T.: Fully convolutional networks for semantic segmentation. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965

  25. Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J.: Pyramid scene parsing network. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890 (2017). https://doi.org/10.1109/CVPR.2017.660

  26. Zunair, H., Hamza, A.B.: Sharp U-Net: depthwise convolutional network for biomedical image segmentation. Comput. Biol. Med. 136, 104699–104699 (2021)

    Article  Google Scholar 

  27. He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  28. Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q.: Densely connected convolutional networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017). https://doi.org/10.1109/CVPR.2017.243

  29. Howard, A. et al.: Searching for mobilenetv3. in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)

  30. Tan, M. and Le, Q. V.: Efficientnetv2: Smaller models and faster training. ar**v preprint https://arxiv.org/abs/2104.00298, (2021)

  31. Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., and Dollár, P.: Designing network design spaces. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10428–10436 (2020)

  32. Ma, N., Zhang, X., Zheng, H.-T., and Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. in Proceedings of the European conference on computer vision (ECCV), pp. 116–131 (2018)

Download references

Funding

This work was supported partly by the National Natural Science Foundation of China[grant numbers 61379065]; the Natural Science Foundation of Hebei province in China [grant numbers F2019203285].

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shi Wang or Shihui Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Zhang, S., Zhang, X. et al. A two-branch hand gesture recognition approach combining atrous convolution and attention mechanism. Vis Comput 39, 4487–4500 (2023). https://doi.org/10.1007/s00371-022-02602-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02602-2

Keywords

Navigation