Abstract
RDF2vec (and other techniques) provide embedding vectors for knowledge graphs. While we have used a simple node classification task so far, this chapter introduces a few datasets and three common benchmarks for embedding methods—i.e., SW4ML, GEval, and DLCC—and shows how to use them for comparing different variants of RDF2vec. The novel DLCC benchmark allows us to take a closer look at what RDF2vec vectors actually represent, and to analyze what proximity in the vector space means for them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
The desired size classes can be configured in the framework.
- 6.
Since the classification tasks are balanced, random guessing would yield an accuracy of 0.5.
References
Bloehdorn S, Sure Y (2007) Kernel methods for mining instance data in ontologies. The Semantic Web, pp 58–71
de Boer V, Wielemaker J, van Gent J, Hildebrand M, Isaac A, van Ossenbruggen J, Schreiber G (2012) Supporting linked data production for cultural heritage institutes: The Amsterdam museum case study. In: The semantic web: research and applications. Springer, pp 733–747. https://doi.org/10.1007/978-3-642-30284-8_56
de Vries GKD (2013) A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data. In: ECML/PKDD (1), pp 606–621
Frey J, Götz F, Hofer M, Hellmann S (2022) Managing and compiling data dependencies for semantic applications using databus client. In: Research conference on metadata and semantics research. Springer, pp 114–125
Hoffart J, Seufert S, Nguyen DB, Theobald M, Weikum G (2012) KORE: keyphrase overlap relatedness for entity disambiguation. In: Chen X, Lebanon G, Wang H, Zaki MJ (eds) 21st ACM international conference on information and knowledge management, CIKM’12, Maui, HI, USA, October 29 - November 02, 2012, ACM, pp 545–554. https://doi.org/10.1145/2396761.2396832
Lee MD, Pincombe B, Welsh M (2005) An empirical evaluation of models of text document similarity. Proc Ann Meet Cogn Sci Soc 7(7):1254–1529, https://hdl.handle.net/2440/28910
Mendes PN, Jakob M, GarcĂa-Silva A, Bizer C (2011) Dbpedia spotlight: shedding light on the web of documents. In: Ghidini C, Ngomo AN, Lindstaedt SN, Pellegrini T (eds) Proceedings the 7th international conference on semantic systems, I-SEMANTICS 2011, Graz, Austria, September 7-9, 2011, ACM, ACM International Conference Proceeding Series, pp 1–8. https://doi.org/10.1145/2063518.2063519
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. ar**v:1301.3781
Portisch J, Paulheim H (2022) The dlcc node classification benchmark for analyzing knowledge graph embeddings. In: International semantic web conference
Ristoski P, Rosati J, Di Noia T, De Leone R, Paulheim H (2019) Rdf2vec: Rdf graph embeddings and their applications. Semantic Web 10(4):721–752
Ristoski P, Vries GKDd, Paulheim H (2016) A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In: International semantic web conference, Springer, pp 186–194
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Paulheim, H., Ristoski, P., Portisch, J. (2023). Benchmarking Knowledge Graph Embeddings. In: Embedding Knowledge Graphs with RDF2vec. Synthesis Lectures on Data, Semantics, and Knowledge. Springer, Cham. https://doi.org/10.1007/978-3-031-30387-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-30387-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30386-9
Online ISBN: 978-3-031-30387-6
eBook Packages: Synthesis Collection of Technology (R0)eBColl Synthesis Collection 12