Abstract
Written text stands as a cornerstone of communication in our daily lives. However, it is not uncommon for letters to be marred by obscurities, blurriness, erasures, or obstructions, which can lead to misinterpretation and convey unintended meanings. In this study, we present a comprehensive solution to rectify this challenge, comprising three pivotal phases. In the initial phase, we employ an advanced Deep Learning-based text detection and recognition method, specifically utilizing the Text-Block technique to pinpoint textual blocks. In the subsequent phase, we employ a robust combination of database and ontology to reconstruct unclear words. The final phase involves transforming the recovered word into a vivid 3D object through Augmented Reality, leveraging the Vuforia engine. This visualization technique aids visually impaired individuals with inaccurate word comprehension. To validate our approach, we rigorously compared our text detection and recognition methods against prevailing state-of-the-art techniques, achieving unmatched precision. Furthermore, we administered a comprehensive questionnaire to a cohort of visually impaired participants, evaluating the solution against key metrics such as user experience, satisfaction, efficiency, and effectiveness. The results from this survey unequivocally demonstrate the superior quality and efficacy of our proposed methodology.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-18795-8/MediaObjects/11042_2024_18795_Fig12_HTML.png)
Similar content being viewed by others
References
Lioupis P, Dadoukis A, Maltezos E, Karagiannidis L, Amditis A, Gonzalez M, Martin J, Cantero D, Larrañaga M (2022) Embedded intelligence for safety and security machine vision applications. In: International conference on image analysis and processing, Springer, pp 37–46
Ouali I, Fourati R, Halima MB, Wali A (2023) A novel method for arabic text detection with interactive visualization. In: 2023 IEEE Symposium on computers and communications (ISCC), IEEE, pp 1046–1050
Kumar P, Rawat P, Chauhan S (2022) Contrastive self-supervised learning: review, progress, challenges and future research directions. Int J Multimed Inf Retrieval 1–28
Bi C, Hu N, Zou Y, Zhang S, Xu S, Yu H (2022) Development of deep learning methodology for maize seed variety recognition based on improved swin transformer. Agronomy 12:1843
Diamantopoulos T, Roth M, Symeonidis A, Klein E (2017) Software requirements as an application domain for natural language processing. Lang Resour Eval 51:495–524
Paredes-Valverde MA, Valencia-García R, Rodríguez-García MÁ, Colomo-Palacios R, Alor-Hernández G (2016) A semantic-based approach for querying linked data using natural language. J Inf Sci 42:851–862
Ouali I, Halima MB, Ali W (2022) Augmented reality for scene text recognition, visualization and reading to assist visually impaired people. Procedia Comput Sci 176:602–611
Ouali I, Sassi MSH, Halima MB, Ali W (2020) A new architecture based ar for detection and recognition of objects and text to enhance navigation of visually impaired people. Procedia Comput Sci 176:602–611
Ouali I, Hadj Sassi MS, Ben Halima M, Wali A (2021) Architecture for real-time visualizing arabic words with diacritics using augmented reality for visually impaired people. In: International conference on advanced information networking and applications, Springer, pp 285–296
Ouali I, Halima MB, Ali W (2022) Real-time application for recognition and visualization of arabic words with vowels based dl and ar. In: 2022 International wireless communications and mobile computing (IWCMC), IEEE, pp 678–683
Ouali I, Halima MB, Wali A (2022) Text detection and recognition using augmented reality and deep learning. In: International conference on advanced information networking and applications, Springer, pp 13–23
Xu H, Wang Q-F, Li Z, Shi Y, Zhou X-D (2022) Texttriangle: An end-to-end textspotter with piecewise linear alignment
Ibrayim M, Mattohti A, Hamdulla A (2022) An effective method for detection and recognition of uyghur texts in images with backgrounds. Information 13:332
Solé Gómez À, García Castaño J, Leškovskỳ P, Otaegui Madurga O (2022) Polyglonet: Multilingual approach for scene text recognition without language constraints. In: International conference on image analysis and processing, Springer, pp 479–490
Dasari SK, Mehta S (2022) Text detection and recognition using fusion neural network architecture. In: 2022 8th International conference on advanced computing and communication systems (ICACCS), vol 1. IEEE, pp 2067–2071
Zhang X, Su Y, Tripathi S, Tu Z (2022) Text spotting transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9519–9528
Zhong D, Lyu S, Shivakumara P, Pal U, Lu Y (2022) Text proposals with location-awareness-attention network for arbitrarily shaped scene text detection and recognition. Expert Syst Appl 117564
Tong G, Dong M, Sun X, Song Y (2022) Natural scene text detection and recognition based on saturation-incorporated multi-channel mser. Knowl-Based Syst 109040
Mosannafat M, Taherinezhad F, Khotanlou H, Alighardash E (2022) Farsi text detection and localization in videos and images. In: 2022 9th Iranian joint congress on fuzzy and intelligent systems (CFIS), IEEE, pp 1–6
Luo X, Zhu H (2022) A text detection and recognition algorithm for english teaching based on deep learning. Sci Program 2022
Naik MM, Karande MAS, Gaikwad MSA, Heralge MPB, Gurav MSN (2024) Text detection and recognition with speech output in mobile application for assistance to visually challenged person
Chen F, Dou Z-Y (2024) Measuring and mitigating bias in vision-and-language models
Deena G, Raja K et al (2022) Keyword extraction using latent semantic analysis for question generation. J App Sci Eng 26:501–510
Li Z, Guo C, Feng Z, Hwang J-N, Xue X (2024) Multi-view visual semantic embedding
Kordabadi M, Nazari A, Mansoorizadeh M (2022) A movie recommender system based on topic modeling using machine learning methods
Lin S-C, Li M, Lin J (2022) Aggretriever: A simple approach to aggregate textual representation for robust dense passage retrieval. ar**v preprint ar**v:2208.00511
Lin Q, Cao W, He Z (2022) Level-wise aligned dual networks for text–video retrieval. EURASIP J Adv Signal Process 2022:1–20
Ji K, Liu J, Hong W, Zhong L, Wang J, Chen J, Chu W (2022) Cret: Cross-modal retrieval transformer for efficient text-video retrieval. In: Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pp 949–959
Hsieh C-A, Hsieh C-P, Cheng P-J (2024) Mr. right: Multimodal retrieval on representation of image with text
Carlsson F, Eisen P, Rekathati F, Sahlgren M (2024) Cross-lingual and multilingual clip
Srinivasan T, Ren X, Thomason J (2022) Curriculum learning for data-efficient vision-language alignment. ar**v preprint ar**v:2207.14525
Ouali I, Halima MB, Wali A (2023) An augmented reality for an arabic text reading and visualization assistant for the visually impaired. Multimed Tools Appl 1–29
Rehman IU, Ullah S (2022) Gestures and marker based low-cost interactive writing board for primary education. Multimed Tools Appl 81:1337–1356
Kapetanaki A, Krouska A, Troussas C, Sgouropoulou C (2021) A novel framework incorporating augmented reality and pedagogy for improving reading comprehension in special education. In: Novelties in intelligent digital systems, IOS Press, pp 105–110
Rasidin R (2021) Perancangan aplikasi pengenalan objek 3d komponen komputer menggunakan augmented reality berbasis android. Bulletin of Data Science 1:26–31
Syahidi AA, Tolle H, Supianto AA, Arai K (2019) Ar-child: Analysis, evaluation, and effect of using augmented reality as a learning media for preschool children. In: 2019 5th International conference on computing engineering and design (ICCED), IEEE, pp 1–6
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5551–5560
Kim K-H, Hong S, Roh B, Cheon Y, Park M (2016) Pvanet: Deep but lightweight neural networks for real-time object detection. ar**v preprint ar**v:1608.08021
Ouali I, Ghozzi F, Taktak R, Sassi MSH (2019) Ontology alignment using stable matching. Procedia Comput Sci 159:746–755
Acknowledgements
This research project was funded by the Deanship of Scientific Research, Princess Nourah bint Abdulrahman University, through the Program of Research Project Funding After Publication, grant No (43- PRFA-P-50).
Author information
Authors and Affiliations
Contributions
All authors of this study developed the system and wrote and reviewed the manuscript. Imene OUALI: Coding, Writing an original draft, Performing experiments, Analyzing the data, Software, Visualization, Data curation, editing. Mohamed BEN HALIMA: Supervision, Methodology, Validation, Conception and design, Investigation, Formal analysis, and Drafting of the manuscript. Nesrine MASMOUDI: Review, Investigation, Resources, Funding, and QM commented on subsequent versions of the manuscript. Manel AYADI: Review, Investigation, Material preparation, Funding, and QM commented on subsequent versions of the manuscript. Latifa ALMUQREN: Review, Resources, Material preparation, Funding, and QM commented on subsequent versions of the manuscript. Ali WALI: Supervision, Validation, Investigation, Project administration, Conceptualization, Contributing to the study, and Designing the study. All authors have read and approved the final submitted manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable
Research involving human and animal participants
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ouali, I., Ben Halima, M., Masmoudi, N. et al. Text recuperated using ontology with stable marriage optimization technique and text visualization using AR. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18795-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18795-8