Abstract
Depending on the problem at hand, one might think of different tweaks to the RDF2vec algorithm, many of which have been discussed in the past. Those tweaks encompass various steps of the pipeline: reasoners have been used to preprocess the knowledge graph and add implicit knowledge. Different strategies for changing the walk strategy have been proposed, starting from injecting edge weights to biasing the walks towards higher or lower degree nodes, and changing the structure of the extracted walk completely. Moreover, also the embedding creation itself has been analyzed in the past by using different variants of the word2vec word embedding method. In this chapter, we introduce a few of those approaches and highlight their advantages and shortcomings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
The figure also shows non-Wikipedia pages as click sources. Those are ignored when computing the edge weights.
- 4.
The paper reports on a preliminary study and therefore uses only the cities, movies, and albums classification and regression tasks. Therefore, the results are not directly comparable to the results in the previous section.
- 5.
Since neither jRDF2vec nor pyRDF2vec allows for incorporating external weights, the authors of the study have used their own proprietary implementation of RDF2vec for the study. Therefore, no code example is given here. The implementation used for the experiments can be found at https://github.com/ataweel55/RDF2VEC.
- 6.
Please note that both examples are strongly simplified for the sake of illustration. In practice, the representation of a word or an entity is learned from a multitude of sentences or walks, rather than a single sentence or walk.
- 7.
In this task, it is important to return an entity of the right class, e.g., for solving Berlin is to Germany as Paris is to ?, the result must be from the class Country.
- 8.
The snipped assumes that the jRDF2vec JAR has been downloaded and placed in the same directory. It is further assumed that the compiled wang2vec project has been placed in the same directory. For an extensive user guide with examples, we refer the reader to the GitHub repository: https://github.com/dwslab/jRDF2Vec.
- 9.
It is important to note that although they can be derived from standard walks, it is usually much faster to generate those walks directly instead of generating standard walks first and then deriving the e-walks and p-walks.
- 10.
That latter entity is unrelated to Mannheim, however, in the DBpedia graph, one of the few statements about this entity is that it is different from Peter Kurz, who, in turn, is related to Mannheim. This leads to a large fraction of multi-hop walks starting in the entity Peter Kurze containing the entity Mannheim and other entities related to Mannheim, making it ultimately ending up close to Mannheim in the vector space. This anecdotic example shows that explicit negative information (here: an entity not being related to another entity) is not very well picked up by RDF2vec, and even has a contrary effect.
- 11.
Note that the exact syntax of the code might change once this becomes an official feature.
- 12.
The corresponding walk type option for p-walks would be EXPERIMENTAL_MID_EDGE_WALKS_DUPLICATE_FREE.
- 13.
Caveat: you may not directly compare the accuracies of jRDF2vec and pyRDF2vec, because smaller differences may also be explained by subtly different implementations, and/or different random seeds.
- 14.
Note while this holds in theory, real-world knowledge graphs often pose practical scalability challenges to existing reasoners (Heist and Paulheim 2021).
- 15.
- 16.
- 17.
- 18.
See Chap. 5.
- 19.
Materialization is usually done externally as a preprocessing step and hence not included in the table.
References
Abburu S (2012) A survey on ontology reasoners and comparison. Int J Comput Appl 57(17)
Al Taweel A, Paulheim H (2020) Towards exploiting implicit human feedback for improving rdf2vec embeddings. In: CEUR workshop proceedings, RWTH, vol 2635, pp 1–10
Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: classification, clustering and extraction techniques. ar**v:1707.02919
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exper 10:P10008
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117
Cochez M, Ristoski P, Ponzetto SP, Paulheim H (2017) Biased graph walks for rdf graph embeddings. In: Proceedings of the 7th international conference on web intelligence, mining and semantics, pp 1–12
Comrie B (1989) Language universals and linguistic typology: syntax and morphology. University of Chicago Press
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Gangemi A, Guarino N, Masolo C, Oltramari A (2003) Sweetening wordnet with dolce. AI Mag 24(3):13–13
Gangemi A, Mika P (2003) Understanding the semantic web through descriptions and situations. In: OTM confederated international conferences “On the move to meaningful internet systems”. Springer, pp 689–706
Heist N, Paulheim H (2021) The caligraph ontology as a challenge for owl reasoners. In: SemREC 2021: semantic reasoning evaluation challenge 2021, pp 21–31
Iana A, Paulheim H (2020) More is not always better: the negative impact of a-box materialization on rdf2vec knowledge graph embeddings. In: CIKM (Workshops)
Ivanov S, Burnaev E (2018) Anonymous walk embeddings. ar**v:1805.11921
Lehmann J (2009) Dl-learner: learning concepts in description logics. J Mach Learn Res 10:2639–2642
Ling W, Dyer C, Black AW, Trancoso I (2015a) Two/too simple adaptations of word2vec for syntax problems. In: NAACL HLT 2015, ACL, pp 1299–1304
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015b) Learning entity and relation embeddings for knowledge graph completion. In: Twenty-ninth AAAI conference on artificial intelligence
Newman DA (2014) Missing data: five practical guidelines. Org Res Methods 17(4):372–411
Paulheim H (2017) Knowledge graph refinement: a survey of approaches and evaluation methods. Semant Web 8(3):489–508
Paulheim H, Gangemi A (2015) Serving dbpedia with dolce–more than just adding a cherry on top. In: International semantic web conference. Springer, pp 180–196
Perozzi B, Kulkarni V, Chen H, Skiena S (2017) Don’t walk, skip! online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 258–265
Portisch J, Paulheim H (2021) Putting rdf2vec in order. In: International semantic web conference, posters and demonstrations
Portisch J, Paulheim H (2022) Walk this way! entity walks and property walks for rdf2vec. In: Extended semantic web conference 2022, posters and demonstrations
Schlötterer J, Wehking M, Rizi FS, Granitzer M (2019) Investigating extensions to random walk based graph embedding. In: 2019 IEEE international conference on cognitive computing (ICCC), IEEE, pp 81–89
Sirin E, Parsia B, Grau BC, Kalyanpur A, Katz Y (2007) Pellet: a practical owl-dl reasoner. J Web Semant 5(2):51–53
Steenwinckel B, Vandewiele G, Bonte P, Weyns M, Paulheim H, Ristoski P, Turck FD, Ongenae F (2021) Walk extraction strategies for node embeddings with rdf2vec in knowledge graphs. In: International conference on database and expert systems applications. Springer, pp 70–80
Thalhammer A, Rettinger A, (2016) PageRank on wikipedia: towards general importance scores for entities. The semantic web: ESWC 2016 satellite events. Springer International Publishing, Crete, Greece, pp 227–240
van Erp M, Mendes P, Paulheim H, Ilievski F, Plu J, Rizzo G, Waitelonis J (2016) Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: 10th international conference on language resources and evaluation (LREC)
Vandewiele G, Steenwinckel B, Ongenae F, De Turck F (2019) Inducing a decision tree with discriminative paths to classify entities in a knowledge graph. In: SEPDA2019, the 4th international workshop on semantics-powered data mining and analytics, pp 1–6
Vrandečić D, Krötzsch M (2014) Wikidata: a free collaborative knowledge base. Commun ACM 57(10):78–85. http://dx.doi.org/10.1145/2629489
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence, vol 28
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Paulheim, H., Ristoski, P., Portisch, J. (2023). Tweaking RDF2vec. In: Embedding Knowledge Graphs with RDF2vec. Synthesis Lectures on Data, Semantics, and Knowledge. Springer, Cham. https://doi.org/10.1007/978-3-031-30387-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-30387-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30386-9
Online ISBN: 978-3-031-30387-6
eBook Packages: Synthesis Collection of Technology (R0)eBColl Synthesis Collection 12