Line, Word, and Character Segmentation from Bangla Handwritten Text—A Precursor Toward Bangla HOCR

  • Chapter
  • First Online:
Advanced Computing and Systems for Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 666))

Abstract

The basic functionalities of optical character recognition (OCR) are to recognize and extract text to digitally editable text from document images. Apart from this, an OCR has other potentials in document image processing such as in automatic document sorter, writer identification/verification. In current situation, various commercially available OCR systems can be found mostly for Roman script. Development of an unconstrained offline handwritten character recognition system is one of the most challenging tasks for the research community. Things get more complicated when we consider Indic scripts like Bangla which contains more than 280 modified and compound characters along with isolated characters. For recognition of handwritten document, the most convenient way is to segment the text into characters or character parts. So line, word and character level segmentation plays a vital role in the development of such a system. In this paper, a scheme for tri-level segmentation (line, word, and character) is presented. Encouraging segmentation results are achieved on a set of 50 handwritten text documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. https://www.abbyy.com/. Last accessed 07 Dec 2016

  2. Rakshit, S., Basu, S.: Recognition of handwritten Roman script using Tesseract open source OCR engine. In: National Conference on (NAQC), pp. 141–145 (2008)

    Google Scholar 

  3. Tsukumo, J., Tanaka, H.: Classification of handprinted Chinese characters using nonlinear normalization methods. In: 9th International Conference on Pattern Recognition, pp. 168–171 (1988)

    Google Scholar 

  4. Yamada, H., Yamamoto, K., Saito, T.: A non-linear normalization method for handprinted Kanji character recognition line density equalization. Pattern Recognit. 23, 1023–1029 (1990)

    Article  Google Scholar 

  5. Bhunia, A.K., Das, A., Roy, P.P., Pal, U.: A comparative study of features for handwritten Bangla text recognition. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 636–640 (2015)

    Google Scholar 

  6. Bag, S., Harit, G., Bhowmick, P.: Recognition of Bangla compound characters using structural decomposition. Pattern Recognit. 47, 1187–1201 (2013)

    Article  Google Scholar 

  7. Das, N., Das, B., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: Handwritten Bangla basic and compound character recognition using MLP and SVM classifier. J. Comput. 2, 109–115 (2010)

    Google Scholar 

  8. Wen, Y., Lu, Y., Shi, P.F.: Handwritten Bangla numeral recognition system and its applicaiton to postal automation. Pattern Recognit. 40, 99–107 (2007)

    Article  Google Scholar 

  9. Sarkhel, R., Das, N., Saha, A.K., Nasipuri, M.: A multi-objective approach towards cost effective isolated handwritten Bangla character and digit recognition. Pattern Recognit. 58, 172–189 (2016)

    Article  Google Scholar 

  10. Halder, C., Roy, K.: Word & character segmentation for Bangla handwriting analysis & recognition. In: 3rd National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 243–246 (2011)

    Google Scholar 

  11. Maitra, D.S., Bhattacharya, U., Parui, S.K.: CNN based common approach to handwritten character recognition of multiple scripts. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1021–1025 (2015)

    Google Scholar 

  12. Rahman, M.M., Akhand, M.A.H., Islam, S., Shill, P.C., Rahman, M.M.H.: Bangla handwritten character recognition using convolutional neural network. Int. J. Image Graph. Signal Process. (IJIGSP) 7, 42–49 (2015)

    Article  Google Scholar 

  13. Halder, C., Obaidullah, S.M., Roy, K.: Effect of writer information on Bangla handwritten character recognition. In: 5th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), pp. 1–4 (2015)

    Google Scholar 

  14. Sahlol, A.T., Suen, C.Y., Elbasyouni, M.R., Sallam, A.A.: A proposed OCR algorithm for the recognition of handwritten Arabic characters. J. Pattern Recognit. Intell. Syst. 2, 8–22 (2014)

    Google Scholar 

  15. Acharya, S., Pant, A.K., Gyawali, P.K.: Deep learning based large scale handwritten Devanagari character recognition. In: 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), pp. 1–6 (2015)

    Google Scholar 

  16. Kamble, M., Hegadi, S.: Handwritten Marathi character recognition using R-HOG feature. In: International Conference on Advanced Computing Technologies and Applications (ICACTA), Procedia Computer Science, vol. 45, pp. 266–274 (2015)

    Article  Google Scholar 

  17. Varghese, K.S., Jamesa, A., Chandran, S.: A novel tri-stage recognition scheme for handwritten Malayalam character recognition. In: International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST), Pattern Recognition, vol. 24, pp. 1333–1340 (2016)

    Article  Google Scholar 

  18. Htike, T., Thein, Y.: Handwritten character recognition using competitive neural trees. Int. J. Eng. Technol. 5, 352 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

Two of the authors, Ms. Payel Rakshit and Mr. Chayan Halder, are thankful to Department of Science and Technology (DST) for their support as INSPIRE fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Payel Rakshit .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rakshit, P., Halder, C., Ghosh, S., Roy, K. (2018). Line, Word, and Character Segmentation from Bangla Handwritten Text—A Precursor Toward Bangla HOCR. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol 666. Springer, Singapore. https://doi.org/10.1007/978-981-10-8180-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8180-4_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8179-8

  • Online ISBN: 978-981-10-8180-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation