Log in

NL2SQL with partial missing metadata based on multi-view metadata graph compensation and reasoning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The performance of metadata-dependent NL2SQL models will be seriously decreased, while facing the incomplete or distorted metadata information. In response to this problem, we proposed a metadata compensation approach, which represents the question together with SQL query, data cell value relevance and incomplete schema data as a global metadata graph, and applies knowledge graph reasoning to complete the metadata graph. This global metadata graph is a multi-graph. An improved transR model was proposed to represent this multi-graph by integrating the contributions from multiple relationships between two nodes. Depending on the compensated metadata graph, new end-to-end and preprocess improving frameworks were respectively constructed for adapting to different metadata-dependent NL2SQL systems. The new models have been evaluated on Spider dataset with artificially simulated partial metadata relation deficiency or metadata distortion. Except ablation comparing, the new models also have been compared with some approaches of existing and have demonstrated improved performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and materials

Not applicable.

Code Availability

Not applicable.

References

  1. Katsogiannis-Meimarakis G, Koutrika G (2023) A survey on deep learning approaches for text-to-sql. VLDB J, 1–32. https://doi.org/10.1007/s00778-022-00776-8

  2. Yu T, Zhang R, Yang K, Yasunaga M, Wang D, Li Z, Ma J, Li I, Yao Q, Roman S et al (2018) Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3911–3921. https://doi.org/10.18653/v1/d18-1425

  3. Yu T, Zhang R, Yasunaga M, Tan YC, Lin XV, Li S, Er H, Li I, Pang B, Chen T et al (2019) Sparc: cross-domain semantic parsing in context. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4511–4523. https://doi.org/10.18653/v1/p19-1443

  4. Scholak T, Schucher N, Bahdanau D (2021) Picard: parsing incrementally for constrained auto-regressive decoding from language models. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 9895–9901. https://doi.org/10.18653/v1/2021.emnlp-main.779

  5. Hui B, Geng R, Wang L, Qin B, Li Y, Li B, Sun J, Li Y (2022) S2sql: injecting syntax to question-schema interaction graph encoder for text-to-sql parsers. In: Findings of the association for computational linguistics: ACL 2022, pp 1254–1262. https://doi.org/10.18653/v1/2022.findings-acl.9

  6. Qi J, Tang J, He Z, Wan X, Cheng Y, Zhou C, Wang X, Zhang Q, Lin Z (2022) Rasat: integrating relational structures into pretrained seq2seq model for text-to-sql. In: Proceedings of the 2022 conference on empirical methods in natural language processing, pp 3215–3229. https://doi.org/10.48550/ar**v.2205.06983

  7. Wang B, Shin R, Liu X, Polozov O, Richardson M (2020) Rat-sql: relation-aware schema encoding and linking for text-to-sql parsers. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7567–7578. https://doi.org/10.18653/v1/2020.acl-main.677

  8. Rubin O, Berant J (2021) Smbop: semi-autoregressive bottom-up semantic parsing. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 311–324. https://doi.org/10.18653/v1/2021.naacl-main.290

  9. Cao R, Chen L, Chen Z, Zhao Y, Zhu S, Yu K (2021) Lgesql: line graph enhanced text-to-sql model with mixed local and non-local relations. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 2541–2555. https://doi.org/10.18653/v1/2021.acl-long.198

  10. Lin XV, Socher R, **ong C (2020) Bridging textual and tabular data for cross-domain text-to-sql semantic parsing. In: Findings of the association for computational linguistics: EMNLP 2020, pp 4870–4888. https://doi.org/10.18653/v1/2020.findings-emnlp.438

  11. Guo J, Zhan Z, Gao Y, **ao Y, Lou J-G, Liu T, Zhang D (2019) Towards complex text-to-sql in cross-domain database with intermediate representation. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4524–4535. https://doi.org/10.18653/v1/p19-1444

  12. Dong L, Lapata M (2018) Coarse-to-fine decoding for neural semantic parsing. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 731–742. https://doi.org/10.18653/v1/p18-1068

  13. Chen Z, Chen L, Zhao Y, Cao R, Xu Z, Zhu S, Yu K (2021) Shadowgnn: graph projection neural network for text-to-sql parser. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 5567–5577. https://doi.org/10.18653/v1/2021.naacl-main.441

  14. Hui B, Geng R, Ren Q, Li B, Li Y, Sun J, Huang F, Si L, Zhu P, Zhu X (2021) Dynamic hybrid relation exploration network for cross-domain context-dependent semantic parsing. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 13116–13124. https://doi.org/10.1609/aaai.v35i14.17550

  15. Khan MR, Blumenstock JE (2019) Multi-gcn: graph convolutional networks for multi-view networks, with applications to global poverty. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 606–613. https://doi.org/10.1609/aaai.v33i01.3301606

  16. Ishiwatari T, Yasuda Y, Miyazaki T, Goto J (2020) Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 7360–7370. https://doi.org/10.18653/v1/2020.emnlp-main.597

  17. Schlichtkrull M, Kipf TN, Bloem P, Berg Rvd, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference. Springer, pp 593–607 https://doi.org/10.7287/peerj-cs.1073v0.2/reviews/2

  18. Li S, Li W-T, Wang W (2020) Co-gcn for multi-view semi-supervised learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 4691–4698. https://doi.org/10.1609/aaai.v34i04.5901

  19. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Twenty-ninth AAAI conference on artificial intelligence. https://doi.org/10.1609/aaai.v29i1.9491

  20. Yu T, Yasunaga M, Yang K, Zhang R, Wang D, Li Z, Radev D (2018) Syntaxsqlnet: syntax tree networks for complex and cross-domain text-to-sql task. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 1653–1663. https://doi.org/10.18653/v1/d18-1193

Download references

Acknowledgements

This work is supported by Sichuan Province Science and Technology Support Program, No.: 2020YFS0090, 2020ZHCG0078, 2021YFN0117, 2022YFS0135; Medico Engineering Cooperation Funds from university of Electronic Science and Technology of China No.: ZYGX2021YGLH011.

Funding

This work is supported by Sichuan Province Science and Technology Support Program, No.: 2020YFS0090, 2020ZHCG0078, 2021YFN0117, 2022YFS0135; Medico Engineering Cooperation Funds from university of Electronic Science and Technology of China No.: ZYGX2021YGLH011.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Jie Lin; Methodology: Jie Lin, JiYan Li; Formal analysis and investigation: Jie Lin, YuLong Liang, JiYan Li, Yi Bai, Yong Wang; Software : JiYan Li, YuLong Liang; Writing - original draft preparation: Jie Lin, JiYan Li, YuLong Liang, Yi Bai, Yong Wang; Writing - review and editing: Jie Lin, YuLong Liang, JiYan Li, Yi Bai, Yong Wang; Funding acquisition:Jie Lin; Validation: Yi Bai, YuLong Liang, Yong Wang.

Corresponding author

Correspondence to Jie Lin.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, J., Liang, Y., Li, J. et al. NL2SQL with partial missing metadata based on multi-view metadata graph compensation and reasoning. Appl Intell 54, 1511–1524 (2024). https://doi.org/10.1007/s10489-023-05221-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05221-z

Keywords

Navigation