Abstract
Text localization in digital images includes various applications such as routing in auto-driving, postal services, identifying containers in ports, robotic navigation in urban environments, and so on. Multiple methods such as Stroke Width Transform (SWT), Local Adaptive Thresholding (LAT), and Maximally Stable Extremal Regions (MSER) have already been noticed by many researchers. These methods address specific challenges such as text orientation, font variations, size, and non-uniform illuminations. Additionally, deep methods with high accuracy are not optimal regarding computation time and data required for training. Our research considers the SWT, LAT, and MSER methods as a pre-processing step with some improvement techniques to simultaneously deal with the challenges above and improve the text localization result. Maximally Stable Extremal Regions, multilevel binarization, and non-maximum suppression techniques have been used in our proposed approach. Our proposed approach has reasonable results compared with the none-deep state-of-the-art algorithms and deep learning-based methods on the ICDAR 2013 dataset. Recall, precision, and f-measure are 0.7442, 0.9116, and 0.8195, respectively.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-022-13179-2/MediaObjects/11042_2022_13179_Fig14_HTML.png)
Similar content being viewed by others
References
Agrahari A, Ghosh R (2020) Multi-oriented text detection in natural scene images based on the intersection of MSER with the locally binarized image. Proc Comput Sci 171:322–330. https://doi.org/10.1016/j.procs.2020.04.033
Ansari GJ, Shah JH, Yasmin M, Sharif M, Fernandes SL (2018) A novel machine learning approach for scene text extraction. Future Gener Comput Syst 87:328–340. https://doi.org/10.1016/j.future.2018.04.074
Bandyopadhyay A, Hakim D, Funk BE, Kohn EA, Teolis CA, Blankenship G (2016) System and method for locating, tracking, and/or monitoring the status of personnel and/or assets both indoors and outdoors. Patent US9448072B2
Chen K, Yin F, Hussain A, Liu C (2015) Efficient text localization in born-digital images by local contrast-based segmentation. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 23–26 Aug. 2015 pp 291–295. https://doi.org/10.1109/ICDAR.2015.7333770
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 13–18 June 2010 pp 2963–2970. https://doi.org/10.1109/CVPR.2010.5540041
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, 23–28 June 2008 pp 1–8. https://doi.org/10.1109/CVPR.2008.4587597
Jung K, In Kim K, Jain AK (2004) Text information extraction in images and video: a survey. Pattern Recogn 37(5):977–997. https://doi.org/10.1016/j.patcog.2003.10.012
Karatzas D et al (2013) ICDAR 2013 robust reading competition. In 2013 12th International Conference on Document Analysis and Recognition, 25–28 Aug 2013 pp 1484–1493. https://doi.org/10.1109/ICDAR.2013.221
Khlif W, Nayef N, Burie J, Ogier J, Alimi A (2018) Learning text component features via convolutional neural networks for scene text detection. In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), 24–27 April 2018 pp 79–84. https://doi.org/10.1109/DAS.2018.65
Kumuda T, Lingappa B (2015) Detection and localization of text from natural scene images using texture features pp 1–4
Leibin G, Jizheng C (2017) Natural scene text detection based on SWT, MSER and candidate classification. In 2017 2nd International Conference on Image, Vision and Computing (ICIVC), 2–4 June 2017 pp 26–30. https://doi.org/10.1109/ICIVC.2017.7984452
Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Document Anal Recognition (IJDAR) 7(2):84–104. https://doi.org/10.1007/s10032-004-0138-z
Long S, He X, Yao C (2020) Scene text detection and recognition: the deep learning era. Int J Comput Vis. https://doi.org/10.1007/s11263-020-01369-0
Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask text spotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans Pattern Anal Mach Intell 43:532–548
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation pp 7553–7563
Msr: Multi-scale shape regression for scene text detection (2019)
Neumann L, Matas J (2015) Efficient scene text localization and recognition with local character refinement. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 23–26 Aug pp 746–750. https://doi.org/10.1109/ICDAR.2015.7333861
Neumann L, Matas J (2016) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885. https://doi.org/10.1109/TPAMI.2015.2496234
Shavitt Y, Zilberman N (2011) A geolocation databases study. IEEE J Sel Areas Commun 29(10):2044–2056. https://doi.org/10.1109/JSAC.2011.111214
Shores TS (2018) Applied linear algebra and matrix analysis. Springer, Cham
Sun Y, Dawut A, Hamdulla A (2018) A review: text detection in natural scene image. In 2018 3rd International Conference on Smart City and Systems Engineering (ICSCSE), 29–30 Dec 2018, pp 826–829. https://doi.org/10.1109/ICSCSE.2018.00178
Text detection and localization in natural scene images based on text awareness score 49(4) (2019)
Text detection based on edge enhanced contrast extremal region and tensor voting in natural scene images. Smart Media J 6(4):32–40 (2017)
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, 8–14 Dec. 2001 vol 1 pp I-I. https://doi.org/10.1109/cvpr.2001.990517
Wang Y, Shi C, **ao B, Wang C, Qi C (2018) CRF based text detection for natural scene images using convolutional neural network and context information. Neurocomputing 295:46–58. https://doi.org/10.1016/j.neucom.2017.12.058
Wu Y, Natarajan P (2017) Self-organized text detection with minimal post-processing via border learning. In 2017 IEEE International Conference on Computer Vision (ICCV), 22–29 Oct 2017 pp 5010–5019. https://doi.org/10.1109/ICCV.2017.535
**e E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. Proc AAAI Conf Artif Intell 33:9038–9045. https://doi.org/10.1609/aaai.v33i01.33019038
**lin C, Jie Y, **g Z, Waibel A (2004) Automatic detection and recognition of signs from natural scenes. IEEE Trans Image Process 13(1):87–99. https://doi.org/10.1109/TIP.2003.819223
Yao C, Bai X, Shi B, Liu W (2014) Strokelets: a learned multi-scale representation for scene text recognition. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, 23–28 pp 4042–4049. https://doi.org/10.1109/CVPR.2014.515
Ye J, Chen Z, Liu J, Du B (2020) TextFuseNet: scene text detection with richer fused features. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI-20}: International Joint Conferences on Artificial Intelligence Organization, pp 516–522. https://doi.org/10.24963/ijcai.2020/72. [Online]. Available: https://doi.org/10.24963/ijcai.2020/72
Yi C, Tian Y (2012) Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans Image Process 21(9):4256–4268. https://doi.org/10.1109/TIP.2012.2199327
Yin X, Yin X, Huang K, Hao H (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983. https://doi.org/10.1109/TPAMI.2013.182
Zhou X et al (2017) EAST: an efficient and accurate scene text detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017 pp 2642–2651. https://doi.org/10.1109/CVPR.2017.283
Funding
The author(s) received no financial support for this article’s research, authorship, and publication.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author(s) declared no potential conflicts of interest concerning this article’s research, authorship, and publication.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Akoushideh, A., Rasoulnejad, S.M.F. & Shahbahrami, A. Text localization in digital images using a hybrid method. Multimed Tools Appl 81, 34047–34066 (2022). https://doi.org/10.1007/s11042-022-13179-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13179-2