Abstract
The performance of metadata-dependent NL2SQL models will be seriously decreased, while facing the incomplete or distorted metadata information. In response to this problem, we proposed a metadata compensation approach, which represents the question together with SQL query, data cell value relevance and incomplete schema data as a global metadata graph, and applies knowledge graph reasoning to complete the metadata graph. This global metadata graph is a multi-graph. An improved transR model was proposed to represent this multi-graph by integrating the contributions from multiple relationships between two nodes. Depending on the compensated metadata graph, new end-to-end and preprocess improving frameworks were respectively constructed for adapting to different metadata-dependent NL2SQL systems. The new models have been evaluated on Spider dataset with artificially simulated partial metadata relation deficiency or metadata distortion. Except ablation comparing, the new models also have been compared with some approaches of existing and have demonstrated improved performance.
Similar content being viewed by others
Availability of data and materials
Not applicable.
Code Availability
Not applicable.
References
Katsogiannis-Meimarakis G, Koutrika G (2023) A survey on deep learning approaches for text-to-sql. VLDB J, 1–32. https://doi.org/10.1007/s00778-022-00776-8
Yu T, Zhang R, Yang K, Yasunaga M, Wang D, Li Z, Ma J, Li I, Yao Q, Roman S et al (2018) Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3911–3921. https://doi.org/10.18653/v1/d18-1425
Yu T, Zhang R, Yasunaga M, Tan YC, Lin XV, Li S, Er H, Li I, Pang B, Chen T et al (2019) Sparc: cross-domain semantic parsing in context. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4511–4523. https://doi.org/10.18653/v1/p19-1443
Scholak T, Schucher N, Bahdanau D (2021) Picard: parsing incrementally for constrained auto-regressive decoding from language models. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 9895–9901. https://doi.org/10.18653/v1/2021.emnlp-main.779
Hui B, Geng R, Wang L, Qin B, Li Y, Li B, Sun J, Li Y (2022) S2sql: injecting syntax to question-schema interaction graph encoder for text-to-sql parsers. In: Findings of the association for computational linguistics: ACL 2022, pp 1254–1262. https://doi.org/10.18653/v1/2022.findings-acl.9
Qi J, Tang J, He Z, Wan X, Cheng Y, Zhou C, Wang X, Zhang Q, Lin Z (2022) Rasat: integrating relational structures into pretrained seq2seq model for text-to-sql. In: Proceedings of the 2022 conference on empirical methods in natural language processing, pp 3215–3229. https://doi.org/10.48550/ar**v.2205.06983
Wang B, Shin R, Liu X, Polozov O, Richardson M (2020) Rat-sql: relation-aware schema encoding and linking for text-to-sql parsers. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7567–7578. https://doi.org/10.18653/v1/2020.acl-main.677
Rubin O, Berant J (2021) Smbop: semi-autoregressive bottom-up semantic parsing. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 311–324. https://doi.org/10.18653/v1/2021.naacl-main.290
Cao R, Chen L, Chen Z, Zhao Y, Zhu S, Yu K (2021) Lgesql: line graph enhanced text-to-sql model with mixed local and non-local relations. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 2541–2555. https://doi.org/10.18653/v1/2021.acl-long.198
Lin XV, Socher R, **ong C (2020) Bridging textual and tabular data for cross-domain text-to-sql semantic parsing. In: Findings of the association for computational linguistics: EMNLP 2020, pp 4870–4888. https://doi.org/10.18653/v1/2020.findings-emnlp.438
Guo J, Zhan Z, Gao Y, **ao Y, Lou J-G, Liu T, Zhang D (2019) Towards complex text-to-sql in cross-domain database with intermediate representation. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4524–4535. https://doi.org/10.18653/v1/p19-1444
Dong L, Lapata M (2018) Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 731–742. https://doi.org/10.18653/v1/p18-1068
Chen Z, Chen L, Zhao Y, Cao R, Xu Z, Zhu S, Yu K (2021) Shadowgnn: graph projection neural network for text-to-sql parser. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 5567–5577. https://doi.org/10.18653/v1/2021.naacl-main.441
Hui B, Geng R, Ren Q, Li B, Li Y, Sun J, Huang F, Si L, Zhu P, Zhu X (2021) Dynamic hybrid relation exploration network for cross-domain context-dependent semantic parsing. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 13116–13124. https://doi.org/10.1609/aaai.v35i14.17550
Khan MR, Blumenstock JE (2019) Multi-gcn: graph convolutional networks for multi-view networks, with applications to global poverty. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 606–613. https://doi.org/10.1609/aaai.v33i01.3301606
Ishiwatari T, Yasuda Y, Miyazaki T, Goto J (2020) Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 7360–7370. https://doi.org/10.18653/v1/2020.emnlp-main.597
Schlichtkrull M, Kipf TN, Bloem P, Berg Rvd, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference. Springer, pp 593–607 https://doi.org/10.7287/peerj-cs.1073v0.2/reviews/2
Li S, Li W-T, Wang W (2020) Co-gcn for multi-view semi-supervised learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 4691–4698. https://doi.org/10.1609/aaai.v34i04.5901
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Twenty-ninth AAAI conference on artificial intelligence. https://doi.org/10.1609/aaai.v29i1.9491
Yu T, Yasunaga M, Yang K, Zhang R, Wang D, Li Z, Radev D (2018) Syntaxsqlnet: syntax tree networks for complex and cross-domain text-to-sql task. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 1653–1663. https://doi.org/10.18653/v1/d18-1193
Acknowledgements
This work is supported by Sichuan Province Science and Technology Support Program, No.: 2020YFS0090, 2020ZHCG0078, 2021YFN0117, 2022YFS0135; Medico Engineering Cooperation Funds from university of Electronic Science and Technology of China No.: ZYGX2021YGLH011.
Funding
This work is supported by Sichuan Province Science and Technology Support Program, No.: 2020YFS0090, 2020ZHCG0078, 2021YFN0117, 2022YFS0135; Medico Engineering Cooperation Funds from university of Electronic Science and Technology of China No.: ZYGX2021YGLH011.
Author information
Authors and Affiliations
Contributions
Conceptualization: Jie Lin; Methodology: Jie Lin, JiYan Li; Formal analysis and investigation: Jie Lin, YuLong Liang, JiYan Li, Yi Bai, Yong Wang; Software : JiYan Li, YuLong Liang; Writing - original draft preparation: Jie Lin, JiYan Li, YuLong Liang, Yi Bai, Yong Wang; Writing - review and editing: Jie Lin, YuLong Liang, JiYan Li, Yi Bai, Yong Wang; Funding acquisition:Jie Lin; Validation: Yi Bai, YuLong Liang, Yong Wang.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, J., Liang, Y., Li, J. et al. NL2SQL with partial missing metadata based on multi-view metadata graph compensation and reasoning. Appl Intell 54, 1511–1524 (2024). https://doi.org/10.1007/s10489-023-05221-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05221-z