QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model

Hayashi, Shogo; Dong, Yuyang; Oyamada, Masafumi

doi:10.1007/978-3-031-33383-5_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13938))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

753 Accesses

Abstract

Entity matching (EM) is a fundamental task in data integration, which involves identifying records that refer to the same real-world entity. Unsupervised EM is often preferred in real-world applications, as labeling data is often a labor-intensive process. However, existing unsupervised methods may not always perform well because the assumptions for these methods may not hold for tasks in different domains. In this paper, we propose QA-Matcher, an unsupervised EM model that is domain-agnostic and doesn’t require any particular assumptions. Our idea is to frame EM as question answering (QA) by utilizing a trained QA model. Specifically, we generate a question that asks which record has the characteristics of a particular record and a passage that describes other records. We then use the trained QA model to predict the record pair that corresponds to the question-answer as a match. QA-Matcher leverages the power of a QA model to represent the semantics of various types of entities, allowing it to identify identical entities in a QA-like fashion. In extensive experiments on 16 real-world datasets, we demonstrate that QA-Matcher outperforms unsupervised EM methods and is competitive with supervised methods.

S. Hayashi—This work was conducted while the author was affiliated with NEC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 43.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 54.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised Joint Entity Linking over Question Answering Pair with Global Knowledge

Effective entity matching with transformers

Article 17 January 2023

SAREM: Semi-supervised Active Heterogeneous Entity Matching Framework

Notes

References

Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Cappuzzo, R., Papotti, P., Thirumuruganathan, S.: Creating embeddings of heterogeneous relational datasets for data integration tasks. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1335–1349 (2020)
Google Scholar
Cohen, W.W., Richman, J.: Learning to match and cluster large high-dimensional data sets for data integration. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 475–480 (2002)
Google Scholar
Das, S., et al.: The Magellan data repository. https://sites.google.com/site/anhaidgroup/projects/data
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969)
Article MATH Google Scholar
Fu, C., Han, X., He, J., Sun, L.: Hierarchical matching network for heterogeneous entity resolution. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence, pp. 3665–3671 (2020)
Google Scholar
Ge, C., Wang, P., Chen, L., Liu, X., Zheng, B., Gao, Y.: CollaborEM: a self-supervised entity matching framework using multi-features collaboration. IEEE Trans. Knowl. Data Eng. 1 (2021)
Google Scholar
Iyyer, M., Boyd-Graber, J., Claudino, L., Socher, R., Daumé III, H.: A neural network for factoid question answering over paragraphs. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 633–644 (2014)
Google Scholar
**, D., Sisman, B., Wei, H., Dong, X.L., Koutra, D.: Deep transfer learning for multi-source entity linkage via domain adaptation. Proc. VLDB Endow. 15(3), 465–477 (2021)
Article Google Scholar
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp. 6769–6781 (2020)
Google Scholar
Kasai, J., Qian, K., Gurajada, S., Li, Y., Popa, L.: Low-resource deep entity resolution with transfer and active learning. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5851–5861 (2019)
Google Scholar
Konda, P., et al.: Magellan: toward building entity matching management systems. Proc. VLDB Endow. 9(12), 1197–1208 (2016)
Article Google Scholar
Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endow. 3(1), 484–493 (2010)
Article Google Scholar
Li, Y., Li, J., Suhara, Y., Doan, A., Tan, W.: Deep entity matching with pre-trained language models. Proc. VLDB Endow. 14(1), 50–60 (2020)
Article Google Scholar
Li, Y., Li, J., Suhara, Y., Wang, J., Hirota, W., Tan, W.: Deep entity matching: challenges and opportunities. J. Data Inf. Qual. 13(1), 1:1–1:17 (2021)
Google Scholar
Mudgal, S., et al.: Deep learning for entity matching: a design space exploration. In: Proceedings of the 2018 International Conference on Management of Data, pp. 19–34 (2018)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (2019)
Google Scholar
Voorhees, E.M.: The TREC-8 question answering track report. In: Proceedings of the Eighth Text Retrieval Conference, vol. 99, pp. 77–82 (1999)
Google Scholar
Wei, J., et al.: Finetuned language models are zero-shot learners. ar**v preprint ar**v:2109.01652 (2021)
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)
Google Scholar
Wu, R., Chaba, S., Sawlani, S., Chu, X., Thirumuruganathan, S.: ZeroER: entity resolution using zero labeled examples. In: Proceedings of the ACM SIGMOD 2020 International Conference on Management of Data, pp. 1149–1164 (2020)
Google Scholar
Yin, W., Hay, J., Roth, D.: Benchmarking zero-shot text classification: datasets, evaluation and entailment approach. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the Ninth International Joint Conference on Natural Language Processing, pp. 3912–3921 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

BizReach, Inc., Tokyo, Japan
Shogo Hayashi
NEC Corporation, Tokyo, Japan
Yuyang Dong & Masafumi Oyamada

Authors

Shogo Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Yuyang Dong
View author publications
You can also search for this author in PubMed Google Scholar
Masafumi Oyamada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shogo Hayashi .

Editor information

Editors and Affiliations

Kyoto University, Kyoto, Japan
Hisashi Kashima
IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Tsuyoshi Ide
National Chiao Tung University, Hsinchu, Taiwan
Wen-Chih Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hayashi, S., Dong, Y., Oyamada, M. (2023). QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13938. Springer, Cham. https://doi.org/10.1007/978-3-031-33383-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-33383-5_14
Published: 26 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33382-8
Online ISBN: 978-3-031-33383-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Joint Entity Linking over Question Answering Pair with Global Knowledge

Effective entity matching with transformers

SAREM: Semi-supervised Active Heterogeneous Entity Matching Framework

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

QA-Matcher: Unsupervised Entity Matching Using a Question Answering Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Joint Entity Linking over Question Answering Pair with Global Knowledge

Effective entity matching with transformers

SAREM: Semi-supervised Active Heterogeneous Entity Matching Framework

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation