Log in

Scene text detection by adaptive feature selection with text scale-aware loss

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Since convolutional neural networks(CNNs) were applied to scene text detection, the accuracy of text detection has been improved a lot. However, limited by the receptive fields of regular CNNs and due to the large scale variations of texts in images, current text detection methods may fail to detect some texts well when dealing with more challenging text instances, such as arbitrarily shaped texts and extremely small texts. In this paper, we propose a new segmentation based scene text detector, which is equipped with deformable convolution and global channel attention. In order to detect texts of arbitrary shapes, our method replaces traditional convolutions with deformable convolutions, the sampling locations of deformable convolutions are deformed with augmented offsets so that it can better adapt to any shapes of texts, especially curved texts. To get more representative features for texts, an Adaptive Feature Selection module is introduced to better exploit text content through global channel attention. Meanwhile, a scale-aware loss, which adjusts the weights of text instances with different sizes, is formulated to solve the text scale variation problem. Experiments on several standard benchmarks, including ICDAR2015, SCUT-CTW1500, ICDAR2017-MLT and MSRA-TD500 verify the superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://rrc.cvc.uab.es/?ch=8

References

  1. Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9357–9366

  2. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807

  3. Dai J, Qi H, **s in deep residual networks. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 630–645

  4. He K, Gkioxari G, Dollár P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397

    Article  Google Scholar 

  5. He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: 2017 IEEE International conference on computer vision (ICCV), pp 3066–3074

  6. He W, Zhang X, Yin F, Liu C (2017) Deep direct regression for multi-oriented scene text detection. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2017.87, pp 745–753

  7. Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E (2017) Wordsup: Exploiting word annotations for character based text detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 4950–4959

  8. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023

    Article  Google Scholar 

  9. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269

  10. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 1156–1160

  11. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14202. AAAI Press, pp 4161–4167

  12. Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690. https://doi.org/10.1109/TIP.2018.2825107

    Article  MathSciNet  Google Scholar 

  13. Liao M, Zhu Z, Shi B, **a G, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 5909–5918

  14. Liao M, Lyu P, He M, Yao C, Wu W, Bai X (2021) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans Pattern Anal Mach Intell 43(2):532–548. https://doi.org/10.1109/TPAMI.2019.2937086

    Article  Google Scholar 

  15. Lin T, Dollaŕ P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944

  16. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: Single shot multibox detector. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 21–37

  17. Liu X, Zhou G, Zhang R, Wei X (2020) An accurate segmentation-based scene text detector with context attention and repulsive text border. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). https://doi.org/10.1109/CVPRW50498.2020.00283, pp 2344–2352

  18. Liu Y, ** L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90(2019):337–345. https://doi.org/10.1016/j.patcog.2019.02.002

    Article  Google Scholar 

  19. Liu Z, Lin G, Yang S, Feng J, Lin W, Goh WL (2018) Learning Markov clustering networks for scene text detection. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 6936–6944

  20. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Google Scholar 

  21. Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Computer vision – ECCV 2018. Springer International Publishing, Cham, pp 19–35

  22. Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7553–7563

  23. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Article  Google Scholar 

  24. Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J, Liu C, Ogier J (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification - rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 01, pp 1454–1459

  25. Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. In: Computer vision – ACCV 2010. Springer, Berlin, pp 770–783

  26. Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3538–3545

  27. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  28. Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2017.371. IEEE Computer Society, Los Alamitos, pp 3482–3490

  29. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 761–769

  30. Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(2019):1376–1405. https://doi.org/10.1007/s10489-018-1338-4

    Article  Google Scholar 

  31. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826

  32. Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 56–72

  33. Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00436, pp 4229–4238

  34. Wang W, **e E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9328–9337

  35. **e E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019. https://doi.org/10.1609/aaai.v33i01.33019038. AAAI Press, pp 9038–9045

  36. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2012.6247787, pp 1083–1090

  37. Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. Image Processing IEEE Transactions on 23(11):4737–4749

    Article  MathSciNet  Google Scholar 

  38. Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.01080, pp 10544–10553

  39. Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2558–2567

  40. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4159–4167

  41. Zhong Z, ** L, Huang S (2017) Deeptext: A new approach for text proposal generation and text detection in natural images. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1208–1212

  42. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: An efficient and accurate scene text detector. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2642–2651

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No. 61972180).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qin Wu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Q., Luo, W., Chai, Z. et al. Scene text detection by adaptive feature selection with text scale-aware loss. Appl Intell 52, 514–529 (2022). https://doi.org/10.1007/s10489-021-02331-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02331-4

Keywords

Navigation