Building Super-Resolution Image Generator for OCR Accuracy Improvement

Peng, Xujun; Wang, Chao

doi:10.1007/978-3-030-57058-3_11

Xujun Peng¹¹ &
Chao Wang¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12116))

Included in the following conference series:

International Workshop on Document Analysis Systems

1495 Accesses

Abstract

Super-resolving a low resolution (LR) document image can not only enhance the visual quality and readability of the text, but improve the optical character recognition (OCR) accuracy. However, even despite the ill-posed nature of image super-resolution (SR) problem, how do we treat the finer details of text with large upscale factors and suppress noises and artifacts at the same time, especially for low quality document images is still a challenging task. Thus, in order to boost the OCR accuracy, we propose a generative adversarial network (GAN) based framework in this paper, where a SR image generator and a document image quality discriminator are constructed. To obtain high quality SR document image, multiple losses are designed to encourage the generator to learn the structural properties of texts. Meanwhile, the quality discriminator is trained based on a relativistic loss function. Based on the proposed framework, the obtained SR document images not only maintain the details of textures but remove the background noises, which achieve better OCR performance on the public databases. The source codes and pre-trained models are available at https://gitlab.com/xujun.peng/doc-super-resolution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 42.79; Price includes VAT (Germany)

Softcover Book: EUR 53.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving Text Image Super-Resolution Using Optimal Transport

Scene Text Image Super-Resolution in the Wild

Application of Super Resolution for Optical Character Recognition in Low Quality Images

References

Agrawal, M., Doermann, D.: Stroke-like pattern noise removal in binary document images. In: 2011 International Conference on Document Analysis and Recognition, pp. 17–21 (2011)
Google Scholar
Anwar, S., Khan, S., Barnes, N.: A deep journey into super-resolution: a survey. ar**v preprint ar**v:1904.07523 (2019)
Caner, G., Haritaoglu, I.: Shape-DNA: effective character restoration and enhancement for Arabic text documents. In: 2010 20th International Conference on Pattern Recognition, pp. 2053–2056 (2010)
Google Scholar
Cao, H., Natarajan, P., Peng, X., Subramanian, K., Belanger, D., Li, N.: Progress in the Raytheon BBN Arabic offline handwriting recognition system. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 555–560 (2014)
Google Scholar
Decerbo, M., Natarajan, P., Prasad, R., MacRostie, E., Ravindran, A.: Performance improvements to the BBN Byblos OCR system. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005), vol. 1, pp. 411–415 (2005)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
Article Google Scholar
Dong, C., Zhu, X., Deng, Y., Loy, C.C., Qiao, Y.: Boosting optical character recognition: a super-resolution approach. CoRR abs/1506.02211 (2015). http://arxiv.org/abs/1506.02211
Fawzi, M., et al.: Rectification of camera captured document images for camera-based OCR technology. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1226–1230 (2015)
Google Scholar
Fu, Z., et al.: Cascaded detail-preserving networks for super-resolution of document images. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 240–245 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Jean-Caurant, A., Tamani, N., Courboulay, V., Burie, J.: Lexicographical-based order for post-OCR correction of named entities. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1192–1197 (2017)
Google Scholar
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1erHoR5t7
Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image super-resolution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1645 (2016)
Google Scholar
Kiss, M., Hradis, M., Kodym, O.: Brno mobile OCR dataset. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1352–1357 (2019)
Google Scholar
Kumar, J., Ye, P., Doermann, D.: A dataset for quality assessment of camera captured document images. In: Camera-Based Document Analysis and Recognition, pp. 113–125 (2014)
Google Scholar
Lat, A., Jawahar, C.V.: Enhancing OCR accuracy with super resolution. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3162–3167 (2018)
Google Scholar
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–114 (2017)
Google Scholar
Lu, J., Min, D., Pahwa, R.S., Do, M.N.: A revisit to MRF-based depth map super-resolution and enhancement. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 985–988 (2011)
Google Scholar
Mao, X., Li, Q., **e, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2813–2821 (2017)
Google Scholar
Mokhtar, K., Bukhari, S.S., Dengel, A.: OCR error correction: state-of-the-art vs an NMT-based approach. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 429–434 (2018)
Google Scholar
Nakao, R., Iwana, B.K., Uchida, S.: Selective super-resolution for scene text images. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 401–406 (2019)
Google Scholar
Nayef, N., Chazalon, J., Gomez-Krämer, P., Ogier, J.: Efficient example-based super-resolution of single text images based on selective patch processing. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 227–231 (2014)
Google Scholar
Nayef, N., Luqman, M.M., Prum, S., Eskenazi, S., Chazalon, J., Ogier, J.: SmartDoc-QA: a dataset for quality assessment of smartphone captured document images - single and multiple distortions. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1231–1235 (2015)
Google Scholar
Nguyen, K.C., Nguyen, C.T., Hotta, S., Nakagawa, M.: A character attention generative adversarial network for degraded historical document restoration. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 420–425 (2019)
Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1520–1528 (2015)
Google Scholar
Ohkura, A., Deguchi, D., Takahashi, T., Ide, I., Murase, H.: Low-resolution character recognition by video-based super-resolution. In: 10th International Conference on Document Analysis and Recognition, pp. 191–195 (2009)
Google Scholar
Peng, X., Cao, H., Natarajan, P.: Boost OCR accuracy using iVector based system combination approach. In: Document Recognition and Retrieval XXII, vol. 9402, pp. 116–123 (2015)
Google Scholar
Peyrard, C., Baccouche, M., Mamalet, F., Garcia, C.: ICDAR2015 competition on text image super-resolution. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1201–1205 (2015)
Google Scholar
Rawls, S., Cao, H., Kumar, S., Natarajan, P.: Combining convolutional neural networks and LSTMS for segmentation-free OCR. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 155–160 (2017)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sharma, M., Ray, A., Chaudhury, S., Lall, B.: A noise-resilient super-resolution framework to boost OCR performance. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 466–471 (2017)
Google Scholar
Sharma, M., et al.: An end-to-end trainable framework for joint optimization of document enhancement and recognition. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 59–64 (2019)
Google Scholar
Smith, R., Antonova, D., Lee, D.S.: Adapting the tesseract open source OCR engine for multilingual OCR. In: Proceedings of the International Workshop on Multilingual OCR, pp. 1:1–1:8 (2009)
Google Scholar
Stamatopoulos, N., Gatos, B., Pratikakis, I., Perantonis, S.J.: A two-step dewar** of camera document images. In: 2008 The Eighth IAPR International Workshop on Document Analysis Systems, pp. 209–216 (2008)
Google Scholar
Su, X., Xu, H., Kang, Y., Hao, X., Gao, G., Zhang, Y.: Improving text image resolution using a deep generative adversarial network for optical character recognition. In: 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1193–1199 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, pp. 5998–6008 (2017)
Google Scholar
Walha, R., Drira, F., Lebourgeois, F., Garcia, C., Alimi, A.M.: Handling noise in textual image resolution enhancement using online and offline learned dictionaries. Int. J. Doc. Anal. Recognit. (IJDAR) 21(1), 137–157 (2018)
Article Google Scholar
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems Computers, 2003, vol. 2, pp. 1398–1402 (2003)
Google Scholar
Xu, S., Smith, D.: Retrieving and combining repeated passages to improve OCR. In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 1–4 (2017)
Google Scholar
Yang, W., Zhang, X., Tian, Y., Wang, W., Xue, J.H.: Deep learning for single image super-resolution: a brief review. arxiv abs/1808.03344 (2018)
Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 7354–7363. PMLR (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

ISI, University of Southern California, Marina del Rey, CA, USA
Xujun Peng
LinkedMed Co., Ltd., Spreadtrum Center, Zuchongzhi Road, Shanghai, China
Chao Wang

Authors

Xujun Peng
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xujun Peng .

Editor information

Editors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
**ang Bai
Autonomous University of Barcelona, Barcelona, Spain
Dimosthenis Karatzas
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, X., Wang, C. (2020). Building Super-Resolution Image Generator for OCR Accuracy Improvement. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-57058-3_11
Published: 14 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Building Super-Resolution Image Generator for OCR Accuracy Improvement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Text Image Super-Resolution Using Optimal Transport

Scene Text Image Super-Resolution in the Wild

Application of Super Resolution for Optical Character Recognition in Low Quality Images

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Building Super-Resolution Image Generator for OCR Accuracy Improvement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving Text Image Super-Resolution Using Optimal Transport

Scene Text Image Super-Resolution in the Wild

Application of Super Resolution for Optical Character Recognition in Low Quality Images

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation