Abstract
Super-resolving a low resolution (LR) document image can not only enhance the visual quality and readability of the text, but improve the optical character recognition (OCR) accuracy. However, even despite the ill-posed nature of image super-resolution (SR) problem, how do we treat the finer details of text with large upscale factors and suppress noises and artifacts at the same time, especially for low quality document images is still a challenging task. Thus, in order to boost the OCR accuracy, we propose a generative adversarial network (GAN) based framework in this paper, where a SR image generator and a document image quality discriminator are constructed. To obtain high quality SR document image, multiple losses are designed to encourage the generator to learn the structural properties of texts. Meanwhile, the quality discriminator is trained based on a relativistic loss function. Based on the proposed framework, the obtained SR document images not only maintain the details of textures but remove the background noises, which achieve better OCR performance on the public databases. The source codes and pre-trained models are available at https://gitlab.com/xujun.peng/doc-super-resolution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, M., Doermann, D.: Stroke-like pattern noise removal in binary document images. In: 2011 International Conference on Document Analysis and Recognition, pp. 17–21 (2011)
Anwar, S., Khan, S., Barnes, N.: A deep journey into super-resolution: a survey. ar**v preprint ar**v:1904.07523 (2019)
Caner, G., Haritaoglu, I.: Shape-DNA: effective character restoration and enhancement for Arabic text documents. In: 2010 20th International Conference on Pattern Recognition, pp. 2053–2056 (2010)
Cao, H., Natarajan, P., Peng, X., Subramanian, K., Belanger, D., Li, N.: Progress in the Raytheon BBN Arabic offline handwriting recognition system. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 555–560 (2014)
Decerbo, M., Natarajan, P., Prasad, R., MacRostie, E., Ravindran, A.: Performance improvements to the BBN Byblos OCR system. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), vol. 1, pp. 411–415 (2005)
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
Dong, C., Zhu, X., Deng, Y., Loy, C.C., Qiao, Y.: Boosting optical character recognition: a super-resolution approach. CoRR abs/1506.02211 (2015). http://arxiv.org/abs/1506.02211
Fawzi, M., et al.: Rectification of camera captured document images for camera-based OCR technology. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1226–1230 (2015)
Fu, Z., et al.: Cascaded detail-preserving networks for super-resolution of document images. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 240–245 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Jean-Caurant, A., Tamani, N., Courboulay, V., Burie, J.: Lexicographical-based order for post-OCR correction of named entities. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1192–1197 (2017)
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1erHoR5t7
Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image super-resolution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1645 (2016)
Kiss, M., Hradis, M., Kodym, O.: Brno mobile OCR dataset. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1352–1357 (2019)
Kumar, J., Ye, P., Doermann, D.: A dataset for quality assessment of camera captured document images. In: Camera-Based Document Analysis and Recognition, pp. 113–125 (2014)
Lat, A., Jawahar, C.V.: Enhancing OCR accuracy with super resolution. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3162–3167 (2018)
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–114 (2017)
Lu, J., Min, D., Pahwa, R.S., Do, M.N.: A revisit to MRF-based depth map super-resolution and enhancement. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 985–988 (2011)
Mao, X., Li, Q., **e, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2813–2821 (2017)
Mokhtar, K., Bukhari, S.S., Dengel, A.: OCR error correction: state-of-the-art vs an NMT-based approach. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 429–434 (2018)
Nakao, R., Iwana, B.K., Uchida, S.: Selective super-resolution for scene text images. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 401–406 (2019)
Nayef, N., Chazalon, J., Gomez-Krämer, P., Ogier, J.: Efficient example-based super-resolution of single text images based on selective patch processing. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 227–231 (2014)
Nayef, N., Luqman, M.M., Prum, S., Eskenazi, S., Chazalon, J., Ogier, J.: SmartDoc-QA: a dataset for quality assessment of smartphone captured document images - single and multiple distortions. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1231–1235 (2015)
Nguyen, K.C., Nguyen, C.T., Hotta, S., Nakagawa, M.: A character attention generative adversarial network for degraded historical document restoration. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 420–425 (2019)
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1520–1528 (2015)
Ohkura, A., Deguchi, D., Takahashi, T., Ide, I., Murase, H.: Low-resolution character recognition by video-based super-resolution. In: 10th International Conference on Document Analysis and Recognition, pp. 191–195 (2009)
Peng, X., Cao, H., Natarajan, P.: Boost OCR accuracy using iVector based system combination approach. In: Document Recognition and Retrieval XXII, vol. 9402, pp. 116–123 (2015)
Peyrard, C., Baccouche, M., Mamalet, F., Garcia, C.: ICDAR2015 competition on text image super-resolution. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1201–1205 (2015)
Rawls, S., Cao, H., Kumar, S., Natarajan, P.: Combining convolutional neural networks and LSTMS for segmentation-free OCR. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 155–160 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sharma, M., Ray, A., Chaudhury, S., Lall, B.: A noise-resilient super-resolution framework to boost OCR performance. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 466–471 (2017)
Sharma, M., et al.: An end-to-end trainable framework for joint optimization of document enhancement and recognition. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 59–64 (2019)
Smith, R., Antonova, D., Lee, D.S.: Adapting the tesseract open source OCR engine for multilingual OCR. In: Proceedings of the International Workshop on Multilingual OCR, pp. 1:1–1:8 (2009)
Stamatopoulos, N., Gatos, B., Pratikakis, I., Perantonis, S.J.: A two-step dewar** of camera document images. In: 2008 The Eighth IAPR International Workshop on Document Analysis Systems, pp. 209–216 (2008)
Su, X., Xu, H., Kang, Y., Hao, X., Gao, G., Zhang, Y.: Improving text image resolution using a deep generative adversarial network for optical character recognition. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1193–1199 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, pp. 5998–6008 (2017)
Walha, R., Drira, F., Lebourgeois, F., Garcia, C., Alimi, A.M.: Handling noise in textual image resolution enhancement using online and offline learned dictionaries. Int. J. Doc. Anal. Recognit. (IJDAR) 21(1), 137–157 (2018)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems Computers, 2003, vol. 2, pp. 1398–1402 (2003)
Xu, S., Smith, D.: Retrieving and combining repeated passages to improve OCR. In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 1–4 (2017)
Yang, W., Zhang, X., Tian, Y., Wang, W., Xue, J.H.: Deep learning for single image super-resolution: a brief review. arxiv abs/1808.03344 (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 7354–7363. PMLR (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Peng, X., Wang, C. (2020). Building Super-Resolution Image Generator for OCR Accuracy Improvement. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-57058-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)