Log in

A Deep Learning Approach for Robust, Multi-oriented, and Curved Text Detection

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Automatic text localization and segmentation in a normal environment with vertical or curved texts are core elements of numerous tasks comprising the identification of vehicles and self-driving cars, and preparing significant information from real scenes to visually impaired people. Nevertheless, texts in the real environment can be discovered with a high level of angles, profiles, dimensions, and colors which is an arduous process to detect. In this paper, a new framework based on a convolutional neural network (CNN) is introduced to obtain high efficiency in detecting text even in the presence of a complex background. Due to using a new inception layer and an improved ReLU layer, an excellent result is gained to detect text even in the presence of complex backgrounds. At first, four new m.ReLU layers are employed to explore low-level visual features. The new m.ReLU building block and inception layer are optimized to detect vital information maximally. The effect of stacking up inception layers (kernels with the dimension of 3 × 3 or bigger) is explored and it is demonstrated that this strategy is capable of obtaining mostly varying-sized texts further successfully than a linear chain of convolution layers (Conv layers). The suggested text detection algorithm is conducted in four well-known databases, namely ICDAR 2013, ICDAR 2015, ICDAR 2017, and ICDAR 2019. Text detection results on all mentioned databases with the highest recall of 94.2%, precision of 95.6%, and F-score of 94.8% illustrate that the developed strategy outperforms the state-of-the-art frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

ICDAR2013, ICDAR2015, ICDAR2017, and ICDAR2019 datasets were used in this study.

References

  1. Arafat SY, Ashraf N, Iqbal MJ, Ahmad I, Khan S, Rodrigues JJPC. Urdu signboard detection and recognition using deep learning. Multimed Tools Appl. 2022;81(9):11965–87. https://doi.org/10.1007/S11042-020-10175-2/FIGURES/14.

    Article  Google Scholar 

  2. He W, Zhang XY, Yin F, Liu CL. Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans Image Process. 2018;27(11):5406–19. https://doi.org/10.1109/TIP.2018.2855399.

    Article  MathSciNet  Google Scholar 

  3. Aiman A, Shen Y, Bendechache M, Inayat I, Kumar T. AUDD: audio urdu digits dataset for automatic audio urdu digit recognition. Appl Sci. 2021;11(19):8842. https://doi.org/10.3390/APP11198842.

  4. He W, Zhang X-Y, Yin F, Liu C-L. Deep direct regression for multi-oriented scene text detection. 2017.

  5. Liu F, Chen C, Gu D, Zheng J. FTPN: scene text detection with feature pyramid based text proposal network. IEEE Access. 2019;7:44219–28. https://doi.org/10.1109/ACCESS.2019.2908933.

    Article  Google Scholar 

  6. Jawahar G, Abdul-Mageed M, Lakshmanan LV. Automatic detection of machine generated text: a critical survey. Nov 2020. Available: http://arxiv.org/abs/2011.01314. Accessed 26 Dec 2020.

  7. Tataei Sarshar N, et al. Glioma brain tumor segmentation in four MRI modalities Using a convolutional neural network and based on a transfer learning method. 2023; pp. 386–402. https://doi.org/10.1007/978-3-031-04435-9_39.

  8. Khan W, et al. Introducing urdu digits dataset with demonstration of an efficient and robust noisy decoder-based pseudo example generator. Symmetry. 2022;14(10):1976. https://doi.org/10.3390/SYM14101976.

  9. Zou L, Wang Z, Zhou D. Moving horizon estimation with non-uniform sampling under component-based dynamic event-triggered transmission. Automatica. 2020;120: 109154. https://doi.org/10.1016/j.automatica.2020.109154.

    Article  MathSciNet  Google Scholar 

  10. Ranjbarzadeh R, Baseri Saadi S. Corrigendum to ‘Automated liver and tumor segmentation based on concave and convex points using fuzzy c-means and mean shift clustering’ [Measurement 150 (2020) 107086]. Measurement. 2020;151:107230. https://doi.org/10.1016/J.MEASUREMENT.2019.107230.

  11. Long S, He X, Yao C. Scene text detection and recognition: the deep learning era. International Journal Computer of Vision. 2020;1–24. https://doi.org/10.1007/s11263-020-01369-0.

  12. Deng D, Liu H, Li X, Cai D. PixelLink: detecting scene text via instance segmentation. 32nd AAAI Conf Artif Intell. AAAI 2018: pp. 6773–80. Available: http://arxiv.org/abs/1801.01315. Accessed 24 Dec 2020.

  13. Bian X, Wang C, Quan W, Ye J, Zhang X, Yan DM. Scene text removal via cascaded text stroke detection and erasing. Comput Vis Media. 2022;8(2):273-87. https://doi.org/10.1007/S41095-021-0242-8.

  14. Zhou X, et al. EAST: an efficient and accurate scene text detector. 2017.

  15. Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. 2019.

  16. Liu Y, Chen H, Shen C, He T, ** L, Wang L. ABCNet: real-time scene text spotting with adaptive bezier-curve network. 2020.

  17. Kropat E, Weber G-W, Tirkolaee EB. Foundations of semialgebraic gene-environment networks. J Dyn Games. 2020;7(4):253–268. https://doi.org/10.3934/JDG.2020018.

  18. Özmen A, Kropat E, Weber GW. Robust optimization in spline regression models for multi-model regulatory networks under polyhedral uncertainty. 2016;66(12):2135–2155. https://doi.org/10.1080/02331934.2016.1209672.

  19. Kropat E, Ozmen A, Weber GW, Meyer-Nieberg S, Defterli O. Fuzzy prediction strategies for gene-environment networks – fuzzy regression analysis for two-modal regulatory systems. RAIRO - Oper Res. 2016;50(2):413–35. https://doi.org/10.1051/RO/2015044.

    Article  MathSciNet  Google Scholar 

  20. Kumar TJS-H. Intra-class random erasing (ICRE) augmentation for audio classification. Proc Korean Soc Broadcast Eng Conf. 2020:244–247.

  21. Kalaycı B, Özmen A, Weber GW. Mutual relevance of investor sentiment and finance by modeling coupled stochastic systems with MARS. Ann Oper Res. 2020;295(1):183–206. https://doi.org/10.1007/S10479-020-03757-8/TABLES/1.

    Article  MathSciNet  Google Scholar 

  22. Khan IA, et al. XSRU-IoMT: Explainable simple recurrent units for threat detection in Internet of Medical Things networks. Futur Gener Comput Syst. 2022;127:181–93. https://doi.org/10.1016/J.FUTURE.2021.09.010.

    Article  Google Scholar 

  23. Sahu AK, Sharma S, Tanveer M, Raja R. Internet of things attack detection using hybrid deep learning model. Comput Commun. 2021;176:146–54. https://doi.org/10.1016/J.COMCOM.2021.05.024.

    Article  Google Scholar 

  24. Yue Z, et al. Privacy-preserving time-series medical images analysis using a hybrid deep learning framework. ACM Trans Internet Technol. 2021;21(3). https://doi.org/10.1145/3383779.

  25. Sharma R, Goel T, Tanveer M, Murugan R. FDN-ADNet: Fuzzy LS-TWSVM based deep learning network for prognosis of the Alzheimer’s disease using the sagittal plane of MRI scans. Appl Soft Comput. 2022;115: 108099. https://doi.org/10.1016/J.ASOC.2021.108099.

    Article  Google Scholar 

  26. Dwivedi S, Goel T, Tanveer M, Murugan R, Sharma R. Multimodal fusion-based deep learning network for effective diagnosis of Alzheimer’s disease. IEEE Multimed. 2022;29(2):45–55. https://doi.org/10.1109/MMUL.2022.3156471.

    Article  Google Scholar 

  27. Chakraborty A, Ganguly D, Caputo A, Jones GJF. Kernel density estimation based factored relevance model for multi-contextual point-of-interest recommendation. Inf Retr J. 2020;25(1):44–90. https://doi.org/10.48550/arxiv.2006.15679.

    Article  Google Scholar 

  28. **e E, Zang Y, Shao S, Yu G, Yao C, Li G. Scene text detection with supervised pyramid context network. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019. 2019;33(01):9038–9045. https://doi.org/10.1609/aaai.v33i01.33019038.

  29. Baghban A, Bahadori M, Lemraski AS, Bahadori A. Prediction of solubility of ammonia in liquid electrolytes using Least Square Support Vector Machines. Ain Shams Eng J. 2018;9(4):1303–12. https://doi.org/10.1016/J.ASEJ.2016.08.006.

    Article  Google Scholar 

  30. Liu Z, Baghban A. Application of LSSVM for biodiesel production using supercritical ethanol solvent. 2017;39(17):1869–1874. https://doi.org/10.1080/15567036.2017.1380732.

  31. Teerath Kumar MB, et al. Forged character detection datasets: passports, driving licences and visa stickers. Int Artif Appl. 2022;13(2):21. https://doi.org/10.5121/IJAIA.2022.13202.

  32. Du B, Ye J, Zhang J, Liu J, Tao D. I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-Shaped Scene Text Detection. Int J Comput Vis. 2022;130(8):1961–77. https://doi.org/10.1007/S11263-022-01616-6/FIGURES/11.

    Article  Google Scholar 

  33. Kumar T, Park J, Ali MS, Uddin AS, Ko JH, Bae SH. Binary-classifiers-enabled filters for semi-supervised learning. IEEE Access. 2021; pp. 1–1. https://doi.org/10.1109/ACCESS.2021.3124200.

  34. Ranjbarzadeh R, et al. Nerve optic segmentation in CT images using a deep learning model and a texture descriptor. Complex Intell Syst. 2022;2022:1–15. https://doi.org/10.1007/S40747-022-00694-W.

    Article  Google Scholar 

  35. Ranjbarzadeh R, et al. MRFE-CNN: multi-route feature extraction model for breast tumor segmentation in Mammograms using a convolutional neural network. Ann Oper Res. 2022;2022:1–22. https://doi.org/10.1007/S10479-022-04755-8.

    Article  Google Scholar 

  36. Aghamohammadi A, Ranjbarzadeh R, Naiemi F, Mogharrebi M, Dorosti S, Bendechache M. TPCNN: Two-path convolutional neural network for tumor and liver segmentation in CT images using a novel encoding approach. Expert Syst Appl. 2021;183: 115406. https://doi.org/10.1016/J.ESWA.2021.115406.

    Article  Google Scholar 

  37. Liu X, Wang W. An effective graph-cut scene text localization with embedded text segmentation. Multimed Tools Appl. 2015;74(13):4891–906. https://doi.org/10.1007/s11042-013-1848-3.

    Article  Google Scholar 

  38. Ranjbarzadeh R, Saadi SB, Amirabadi A. LNPSS: SAR image despeckling based on local and non-local features using patch shape selection and edges linking. Meas J Int Meas. Confed. 2020;164. https://doi.org/10.1016/j.measurement.2020.107989.

  39. Tang Y, Wu X. Scene text detection using superpixel-based stroke feature transform and deep learning based region classification. IEEE Trans Multimed. 2018;20(9):2276–88. https://doi.org/10.1109/TMM.2018.2802644.

    Article  Google Scholar 

  40. Nalcaci G, Özmen A, Weber GW. Long-term load forecasting: models based on MARS, ANN and LR methods. Cent Eur J Oper Res. 2019;27(4):1033–49. https://doi.org/10.1007/S10100-018-0531-1/FIGURES/9.

    Article  MathSciNet  Google Scholar 

  41. Shamshirband S, Saraei P, Nabipour N, Baghban A. Hydrocarbons density estimates for a wide range of conditions using RBF-ANN and ANFIS strategies. 2019. https://doi.org/10.1080/15567036.2019.1704313.

  42. Turab M, Kumar T, Bendechache M, Saber M. Investigating multi-feature selection and ensembling for audio classification. 2022. https://doi.org/10.48550/arxiv.2206.07511.

  43. Kanagachidambaresan GR, Ruwali A, Banerjee D, Prakash KB. Recurrent neural network. EAI/Springer Innovations in Communication and Computing, Springer Science and Business Media Deutschland GmbH. 2021; pp. 53–61.

  44. Mousavi SM, Asgharzadeh-Bonab A, Ranjbarzadeh R. Time-frequency analysis of EEG signals and GLCM features for depth of anesthesia monitoring. Comput Intell Neurosci. 2021;2021:1–14. https://doi.org/10.1155/2021/8430565.

    Article  Google Scholar 

  45. Ma J, et al. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed. 2018;20(11):3111–22. https://doi.org/10.1109/TMM.2018.2818020.

    Article  Google Scholar 

  46. Shivakumara P, Raghavendra R, Qin L, Raja KB, Lu T, Pal U. A new multi-modal approach to bib number/text detection and recognition in Marathon images. Pattern Recognit. 2017;61:479–91. https://doi.org/10.1016/j.patcog.2016.08.021.

    Article  Google Scholar 

  47. Liao M, Zou Z, Wan Z, Yao C, Bai X. Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans Pattern Anal Mach Intell. 2022. https://doi.org/10.1109/TPAMI.2022.3155612.

    Article  Google Scholar 

  48. He W, Zhang XY, Yin F, Luo Z, Ogier JM, Liu CL. Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognit. 2020;98:107026. https://doi.org/10.1016/j.patcog.2019.107026.

    Article  Google Scholar 

  49. Tong G, Dong M, Sun X, Song Y. Natural scene text detection and recognition based on saturation-incorporated multi-channel MSER. Knowledge-Based Syst. 2022;250:109040. https://doi.org/10.1016/J.KNOSYS.2022.109040.

    Article  Google Scholar 

  50. Hong S, Roh B, Kim KH, Cheon Y, Park M. PVANet: lightweight deep neural networks for real-time object detection. 2016. Available: http://arxiv.org/abs/1611.08588. Accessed 26 Dec 2020.

  51. Shang W, Sohn K, Almeida D, Lee H. Understanding and improving convolutional neural networks via concatenated rectified linear units. 33rd Int Conf Mach Learn ICML. 2016;5:3276–3284. Available: http://arxiv.org/abs/1603.05201. Accessed 26 Dec 2020.

  52. Ghoushchi SJ, Ranjbarzadeh R, Dadkhah AH, Pourasad Y, Bendechache M. An extended approach to predict retinopathy in diabetic patients using the genetic algorithm and fuzzy C-means. Biomed Res Int. 2021;2021:1–13. https://doi.org/10.1155/2021/5597222.

    Article  Google Scholar 

  53. He T, Huang W, Qiao Y, Yao J. Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process. 2016;25(6):2529–41. https://doi.org/10.1109/TIP.2016.2547588.

    Article  MathSciNet  Google Scholar 

  54. Anari S, Tataei Sarshar N, Mahjoori N, Dorosti S, Rezaie A. Review of deep learning approaches for thyroid cancer diagnosis. Math Probl Eng. 2022;2022:1–8. https://doi.org/10.1155/2022/5052435.

  55. Kim KH, Hong S, Roh B, Cheon Y, Park M. PVANET: deep but lightweight neural networks for real-time object detection. Ar**v, 2016;2012:1–7. Available: http://arxiv.org/abs/1608.08021. Accessed 26 Dec 2020.

  56. Naiemi F, Ghods V, Khalesi H. A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst Appl. 2021;170:114549. https://doi.org/10.1016/j.eswa.2020.114549.

    Article  Google Scholar 

  57. Zhong Z, Sun L, Huo Q. Improved localization accuracy by LocNet for Faster R-CNN based text detection in natural scene images. Pattern Recognit. 2019;96:106986. https://doi.org/10.1016/j.patcog.2019.106986.

    Article  Google Scholar 

  58. Szegedy C, et al. Going Deeper with Convolutions. 2015.

  59. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. 2016.

  60. Ranjbarzadeh R, Bagherian Kasgari A, Jafarzadeh Ghoushchi S, Anari S, Naseri M, Bendechache M. Brain tumor segmentation based on deep learning and an attention mechanism using MRI multi-modalities brain images. Sci Rep. 2021;11(1):10930. https://doi.org/10.1038/s41598-021-90428-8.

  61. Baseri Saadi S, Tataei Sarshar N, Sadeghi S, Ranjbarzadeh R, Kooshki Forooshani M, Bendechache M. Investigation of effectiveness of shuffled frog-lea** optimizer in training a convolution neural network. J Healthc Eng. 2022;2022:1–11. https://doi.org/10.1155/2022/4703682.

  62. Karatzas D, et al. ICDAR 2013 robust reading competition. Proceedings of the international conference on document analysis and recognition, ICDAR. 2013;1484–93. https://doi.org/10.1109/ICDAR.2013.221

  63. Karatzas D, et al. ICDAR 2015 competition on robust reading,” in Proceedings of the international conference on document analysis and recognition, ICDAR. 2015;1156–60. https://doi.org/10.1109/ICDAR.2015.7333942.

  64. Nayef N, et al. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. Proceedings of the international conference on document analysis and recognition, ICDAR, Jul. 2017;1:1454–9. https://doi.org/10.1109/ICDAR.2017.237.

    Article  Google Scholar 

  65. Nayef N, et al. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition-RRC-MLT-2019. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. 2019;1582–7. https://doi.org/10.1109/ICDAR.2019.00254.

  66. Saha S, et al. Multi-lingual scene text detection and language identification. Pattern Recognit Lett. 2020;138:16–22. https://doi.org/10.1016/j.patrec.2020.06.024.

    Article  Google Scholar 

  67. Fateh A, Rezvani M, Tajary A, Fateh M. Persian printed text line detection based on font size. Multimed Tools Appl. 2022;1–26. https://doi.org/10.1007/S11042-022-13243-X/FIGURES/17.

  68. Tasnim F, Habiba SU, Nafisa N, Ahmed A. Depressive Bangla text detection from social media post using different data mining techniques. Lect Notes Electr Eng. 2022;834:237–47. https://doi.org/10.1007/978-981-16-8484-5_21/COVER.

    Article  Google Scholar 

  69. Liao M, Shi B, Bai X, Wang X, Liu W. TextBoxes: a fast text detector with a single deep neural network. 31st AAAI Conf Artif Intell AAAI. 2016;2017:4161–7. Available: http://arxiv.org/abs/1611.06779. Accessed 24 Dec 2020.

  70. Zhong Z, Sun L, Huo Q. An anchor-free region proposal network for Faster R-CNN-based text detection approaches. Int J Doc Anal Recogn. 2019;22(3):315–27. https://doi.org/10.1007/s10032-019-00335-y.

    Article  Google Scholar 

  71. **e L, Liu Y, ** L, **e Z. DeRPN: Taking a further step toward more general object detection. In Proceedings of the AAAI Conference on Artificial Intelligence. 2019;33(01):9046–53. https://doi.org/10.1609/aaai.v33i01.33019046.

  72. Roy S, Shivakumara P, Pal U, Lu T, Kumar GH. Delaunay triangulation based text detection from multi-view images of natural scene. Pattern Recognit Lett. 2020;129:92–100. https://doi.org/10.1016/j.patrec.2019.11.021.

    Article  Google Scholar 

  73. Shi B, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. 2017.

  74. Huang Z, Zhong Z, Sun L, Huo Q. Mask R-CNN with pyramid attention network for scene text detection. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) 2019; pp. 764–772. IEEE. https://doi.org/10.1109/WACV.2019.00086.

  75. Zhang SX, Zhu X, Hou JB, Yang C, Yin XC. Kernel proposal network for arbitrary shape text detection. IEEE Trans Neural Networks Learn Syst. 2022. https://doi.org/10.1109/TNNLS.2022.3152596.

    Article  Google Scholar 

  76. Lyu P, Yao C, Wu W, Yan S, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. 2018.

  77. Wang W, et al. Shape robust text detection with progressive scale expansion network. 2019.

  78. Yang P, et al. Instance segmentation network with self-distillation for scene text detection. IEEE Access. 2020;8. https://doi.org/10.1109/ACCESS.2020.2978225.

  79. Zhong Z, Sun L, Huo Q. A teacher-student learning based born-again training approach to improving scene text detection accuracy. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). 2019; pp. 281–286. IEEE. https://doi.org/10.1109/ICDAR.2019.00053.

  80. Zhang C, et al. Look more than once: an accurate detector for text of arbitrary shapes. 2019.

  81. Yildiz A, Zan H, Said S. Classification and analysis of epileptic EEG recordings using convolutional neural network and class activation map**. Biomed Signal Process Control. 2021;68: 102720. https://doi.org/10.1016/J.BSPC.2021.102720.

    Article  Google Scholar 

  82. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. 2017;618–26. Available: http://gradcam.cloudcv.org. Accessed 29 Oct 2021.

Download references

Funding

This publication has emanated from research conducted with the financial support of/supported in part by a grant from Science Foundation Ireland under grant number no. 18/CRT/6183 and is supported by the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (grant 13/RC/2106/_P2), Lero SFI Centre for Software (grant 13/RC/2094/_P2) and is co-funded under the European Regional Development Fund. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erfan Babaee Tirkolaee.

Ethics declarations

Ethics Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Not applicable.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ranjbarzadeh, R., Jafarzadeh Ghoushchi, S., Anari, S. et al. A Deep Learning Approach for Robust, Multi-oriented, and Curved Text Detection. Cogn Comput 16, 1979–1991 (2024). https://doi.org/10.1007/s12559-022-10072-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-022-10072-w

Keywords

Navigation