Abstract
This study addresses the challenge of detecting semantic column types in relational tables, a key task in many real-world applications. While language models like BERT have improved prediction accuracy, their token input constraints limit the simultaneous processing of intra-table and inter-table information. We propose a novel approach using Graph Neural Networks (GNNs) to model intra-table dependencies, allowing language models to focus on inter-table information. Our proposed method not only outperforms existing state-of-the-art algorithms but also offers novel insights into the utility and functionality of various GNN types for semantic type detection. The code is available at https://github.com/hoseinzadeehsan/GAIT
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, C.: Colnet: embedding the semantics of web tables for column type prediction. In: AAAI (2019)
Chen, J., Jiménez-Ruiz, E., Horrocks, I., Sutton, h.: Learning semantic annotations for tabular data. In: IJCAI. vol. 33, pp. 2088–2094 (2019)
Chen, X., Li, L.J., Fei-Fei, L., Gupta, A.: Iterative visual reasoning beyond convolutions. In: CVPR, pp. 7239–7248 (2018)
Deng, X., Sun, H., Lees, A., Wu, Y., Yu, C.: Turl: table understanding through representation learning. ACM SIGMOD Rec. 51(1), 33–40 (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. ar**v:1810.04805 (2018)
Fernandez, R.C., Abedjan, Z., Koko, F., Yuan, G., Madden, S., Stonebraker, M.: Aurum: A data discovery system. In: ICDE, pp. 1001–1012. IEEE (2018)
Feuer, B., Liu, Y., Hegde, C., Freire, J.: Archetype: a novel framework for open-source column type annotation using large language models. ar**v (2023)
Hu, K., et al.: Viznet: Towards a large-scale visualization learning and benchmarking repository. In: CHI, pp. 1–12 (2019)
Hulsebos, M., et al.: Sherlock: A deep learning approach to semantic data type detection. In: SIGKDD, pp. 1500–1508 (2019)
Iida, H., Thai, D., Manjunatha, V., Iyyer, M.: Tabbie: Pretrained representations of tabular data. ar**v preprint ar**v:2105.02584 (2021)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. ar**v preprint ar**v:1609.02907 (2016)
Korini, K., Bizer, C.: Column type annotation using chatgpt. ar**v (2023)
Li, P., et al.: Table-gpt: Table-tuned gpt for diverse table tasks. ar**v (2023)
Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1261–1270 (2017)
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. ar**v preprint ar**v:1511.05493 (2015)
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. VLDB 3(1–2), 1338–1347 (2010)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. the VLDB Journal 10(4), 334–350 (2001)
Suhara, Y., Li, J., Li, Y., Zhang, D., Demiralp, Ç., Chen, C., Tan, W.C.: Annotating columns with pre-trained language models. In: SIGMOD (2022)
Sun, Y., **n, H., Chen, L.: Reca: related tables enhanced column semantic type annotation framework. VLDB 16(6), 1319–1331 (2023)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. ar**v preprint ar**v:1710.10903 (2017)
Wang, D., Shiralkar, P., Lockard, C., Huang, B., Dong, X.L., Jiang, M.: Tcn: Table convolutional network for web table interpretation. In: WWW (2021)
Wang, M., et al.: Deep graph library: a graph-centric, highly-performant package for graph neural networks. ar**v preprint ar**v:1909.01315 (2019)
Wang, Z., et al.: Tuta: Tree-based transformers for generally structured table pre-training. In: SIGKDD (2021)
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Yin, P., Neubig, G., Yih, W.t., Riedel, S.: Tabert: pretraining for joint understanding of textual and tabular data. ar**v preprint ar**v:2005.08314 (2020)
Zhang, D., Suhara, Y., Li, J., Hulsebos, M., Demiralp, Ç., Tan, W.C.: Sato: Contextual semantic type detection in tables. ar**v preprint ar**v:1911.06311 (2019)
Zhang, H., Dong, Y., **ao, C., Oyamada, M.: Jellyfish: A large language model for data preprocessing. ar**v (2023)
Acknowledgement
The work of Ke Wang is supported in part by a discovery grant from Natural Sciences and Engineering Research Council of Canada.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hoseinzade, E., Wang, K. (2024). Graph Neural Network Approach to Semantic Type Detection in Tables. In: Yang, DN., **e, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14650. Springer, Singapore. https://doi.org/10.1007/978-981-97-2266-2_10
Download citation
DOI: https://doi.org/10.1007/978-981-97-2266-2_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2265-5
Online ISBN: 978-981-97-2266-2
eBook Packages: Computer ScienceComputer Science (R0)