Log in

Devanagari ancient documents recognition using statistical feature extraction techniques

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Devanagari ancient document recognition process is drawing a lot of consideration from researchers nowadays. These ancient documents contain a wealth of knowledge. However, these documents are not available to all because of their fragile condition. A Devanagari ancient manuscript recognition system is designed for digital archiving. This system includes image binarization, character segmentation and recognition phases. It incorporates automatic recognition of scanned and segmented characters. Segmented characters may include basic characters (vowels and consonants), modifiers (matras) and various compound characters (characters formed by joining more than one basic characters). In this paper, handwritten Devanagari ancient manuscripts recognition system has been presented using statistical features extraction techniques. In feature extraction phase, intersection points, open endpoints, centroid, horizontal peak extent and vertical peak extent features are extracted. For classification, Convolutional Neural Network, Neural Network, Multilayer Perceptron, RBF-SVM and random forest techniques are considered in this work. Various feature extraction and classification techniques are considered and compared to the recognition of basic characters segmented from Devanagari ancient manuscripts. A data set, of 6152 pre-segmented samples of Devanagari ancient documents, is considered for experimental work. Authors have achieved 88.95% recognition accuracy using a combination of all features and a combination of all classifiers considered in this work by a simple majority voting scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  1. Shah K R and Badgujar D D 2013 Devnagari handwritten character recognition (DHCR) for ancient documents: a review. In: Proceedings of the 2013 IEEE Conference on Information and Communication Technology. 656–660

  2. Sarkar R, Malakar S, Das N, Basu S and Nasipuri M 2010 A script independent technique for extraction of characters from handwritten word images. Int. J. Comput. Appl. 1(23): 83–88

    Article  Google Scholar 

  3. Kleber F, Sablatnig R, Gau M and Miklas H 2008 Ancient document analysis based on text line extraction. In: Proceedings of the 19th International Conference on Pattern Recognition. 1–4

  4. Bansal V and Sinha R M K 2001 A complete OCR for printed Hindi text in Devanagari script. In: Proceedings of the 6th International Conference on Document Analysis and Recognition. 800–804

  5. Kim M S, Jang M D, Choi H L, Rhee T H, Kim J H and Kwag H K 2004 Digitalizing scheme of handwritten Hanja historical documents. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries. 321–327

  6. Sousa J M C, Pinto J R C, Ribeiro C S and Gil J M 2005 Ancient document recognition using fuzzy methods. In: Proceedings of the IEEE International Conference on Fuzzy Systems. 833–836

  7. Cecotti H and Belaid A 2005 Hybrid OCR combination approach complemented by a specialized ICR applied on ancient documents. In: Proceedings of the 8th International Conference on Document Analysis and Recognition. 1045–1049

  8. Diem M and Sablatnig R 2009 Recognition of degraded handwritten characters using local features. In: Proceedings of the 10th International Conference on Document Analysis and Recognition. 221–225

  9. Raghuraj S, Yadav C S, Verma P and Yadav V 2010 Optical Character Recognition (OCR) for printed Devanagari script using artificial neural network. Int. J.Computer Science & Communication 1(1): 91–95

    Google Scholar 

  10. Holambe A N, Thool R C and Jagade S M 2011 A brief review and survey of feature extraction methods for Devnagari OCR. In: Proceedings of the 9th International Conference on ICT and Knowledge Engineering. 99–104

  11. Yadav D, Sánchez-Cuadrado S and Morato J 2013 OCR for Hindi language using a neural network approach. J. Inf. Process. Syst. 9(1): 117–140

    Article  Google Scholar 

  12. Yunxue S Y, Wang C and **ao B 2015 A character image restoration method for unconstrained handwritten Chinese character recognition. Int. J. Doc. Anal. Recognit. 18(1): 73–86

    Article  Google Scholar 

  13. Katiyar G and Mehfuz S 2016 A hybrid recognition system for off-line handwritten characters. SpringerPlus 5: 1–18

    Article  Google Scholar 

  14. Belhe S, Paulzagade C, Deshmukh A, Jetley S and Mehrotra K 2012 Hindi handwritten word recognition using HMM and symbol tree. In: Proceedings of the Workshop on Document Analysis and Recognition (DAR). 9–14

  15. Lehal G S and Singh C 1999 Feature extraction and classification for OCR of Gurmukhi script. Vivek 12(2): 2–12

    Google Scholar 

  16. Kumar M, Sharma R K and **dal M K 2013 A novel feature extraction technique for offline handwritten Gurmukhi character recognition. IETE J. Res. 59(6): 687–692

    Article  Google Scholar 

  17. Kumar M, Sharma R K and **dal M K 2014 Efficient feature extraction techniques for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(4): 381–391

    Article  Google Scholar 

  18. Kumar M, Sharma R K and **dal M K 2018 Character and numeral recognition for non-Indic and Indic scripts: a survey. Artif. Intell. Rev. https://doi.org/10.1007/s10462-017-9607-x

  19. Kumar M, **dal M K and Sharma R K 2012 Offline handwritten Gurmukhi character recognition: study of different features and classifiers combinations. In: Proceedings of the International Workshop on Document Analysis and Recognition, IIT Bombay. 94–99

  20. Elleuch M, Maalej R and Kherallah M 2016 A new design based-SVM of the CNN classifier architecture with dropout for offline Arabic handwritten recognition. Procedia Computer Science 80: 1712–1723

    Article  Google Scholar 

  21. Lecun Y, Bottou L, Bengio Y and Haffner P 1998 Gradient-based learning applied to document recognition. Proc. IEEE 86(11): 2278–2324

    Article  Google Scholar 

  22. Niu X Y, **a L Y, Wang T X and Zhang X Y 2010 Application of BP-ANN and LS-SVM to discrimination of rice origin based on trace metals. Proc. Int. Conf. Mach. Learn. Cybern. 3: 1426–1430

    Google Scholar 

  23. Zhang Y, Liu B and Yang F 2016 Differential evolution based selective ensemble of extreme learning machine. In: IEEE Trustcom/Bgdatase/ispa. 1327–1333

  24. Kumar M, Sharma R K and **dal M K 2014 A novel hierarchical technique for offline handwritten Gurmukhi character recognition. Natl. Acad. Sci. Lett. 37(6): 567–572

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Munish Kumar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Narang, S., **dal, M.K. & Kumar, M. Devanagari ancient documents recognition using statistical feature extraction techniques. Sādhanā 44, 141 (2019). https://doi.org/10.1007/s12046-019-1126-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-019-1126-9

Keywords

Navigation