Abstract
Every single byte of data is stored in either a structured or unstructured database. In this era of data exploration, retrieving and processing this information is tedious, as databases are ubiquitous. Basic knowledge in query processing languages like SQL, DMX, or QUEL is essential for retrieving such information from a database. Most people, however, are unaware of such query processing languages and find it difficult to write queries because of their lack of knowledge about the structure and format. Queries vary depending on the database used and how results need to be displayed. This can be addressed using an intelligent database system (IDBS) with natural language processing (NLP) capability. An intelligent natural language interface for a database (NLIDB) allows users to query the database in their spoken language. This paper describes an intelligent NLIDB that takes English language queries as input and converts them into corresponding SQL queries for retrieving information. The NLP procedures used here recognize the tokens and predict the possibility of generating clauses including SELECT, WHERE, and FROM. A query translation algorithm is used to map the identified tokens to SQL tokens. To generate SQL queries, a template is used. A query predictor based on maximum entropy generates the SQL queries when the query translator fails. Thus the model was trained with generated queries and different combinations of chunk tags, and their restraints were predicted. The proposed NLP technique is implemented in the maximum entropy model. The model either predicts SQL templates or generates SQL queries. This technique yields 100% correct results for the template-based system. The system offers a maximum probable result which matches the user query for the prediction module. The system consistently generates accurate results for a natural language query in template mode or SQL query mode. Easy retrieval of data from a huge database by making use of local language is of high relevance. Adding other languages and training the model can be seen as a scope of future work.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42044-021-00095-1/MediaObjects/42044_2021_95_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42044-021-00095-1/MediaObjects/42044_2021_95_Fig2_HTML.png)
Similar content being viewed by others
References
Woods, W.A., Kaplan, R.M., Webber, B.N.: The lunar sciences natural language information system. BBN Report 2378, (1972)
Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases—an introduction. J. Nat. Lang. Eng. 1(1), 29–81 (1995)
Popescu, O., An, V.O., Sheinin, V., Khorashani, E., Yeo, H.: Tackling complex queries to relational databases. Lecture Notes Comput. Sci. 11431, 688–701 (2019)
Chiang, D.: Hierarchical phrase-based translation. Comput. Linguist. 33(2), 201–228 (2007)
Vinod Chandra, S.S., Anand, H.S.: Artificial Intelligence and Machine Learning. PHI Learning, New Delhi (2014)
Li, H., Shi, Y.: A wordnet-based natural language interface to relational databases. In: IEEE Conference on Computer and Automation Engineering, pp. 514–518, (2010)
Kate, A., Kamble, S., Bodkhe, A., Joshi, M.: Conversion of natural language query to SQL query. In: Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 488–491 (2018)
Deepthi, S., Rejimaon, R., Vinod Chandra, S.S.: A review on natural language interface for database. Int. J. Appl. Eng. Res. 8(4), 399–402 (2013)
Sander, A., Wauer, R.: Integrating terminologies into standard SQL: a new approach for research on routine data. J. Biomed. Semant. 10, 7 (2019)
Mitchell, P.M., Beatrice, S., Mary, A.M.: Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Rejimoan, R., Vinod Chandra, S.S.: Maximum entropy based natural language interface for relational database. In: ACCIS, Elsevier, Amsterdam, pp. 68–76 (2014)
Berger, A., Stephen, A.D., Vincet, J.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Acedo, L.: A Hidden Markov model for the linguistic analysis of the Voynich manuscript. Math. Comput. Appl. 24(1), 14 (2019)
Affolter, K., Stockinger, K., Bernstein, A.: A comparative survey of recent natural language interfaces for databases. VLDB J. 28, 793–819 (2019)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The author declares that there is no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by the author.
Availability of data and material
The data sets can be accessed upon request. A sample data set is available at http://mirworks.in/downloads.php
Code availability
http://mirworks.in/downloads.php
Informed consent
This article does not contain any studies with human participants or animals performed by the author.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chandra, S.S.V. An intelligent natural language query processor for a relational database. Iran J Comput Sci 5, 109–115 (2022). https://doi.org/10.1007/s42044-021-00095-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42044-021-00095-1