Tweaking RDF2vec

Paulheim, Heiko; Ristoski, Petar; Portisch, Jan

doi:10.1007/978-3-031-30387-6_4

Heiko Paulheim⁶,
Petar Ristoski⁷ &
Jan Portisch⁸

Part of the book series: Synthesis Lectures on Data, Semantics, and Knowledge ((SLDSK))

150 Accesses

Abstract

Depending on the problem at hand, one might think of different tweaks to the RDF2vec algorithm, many of which have been discussed in the past. Those tweaks encompass various steps of the pipeline: reasoners have been used to preprocess the knowledge graph and add implicit knowledge. Different strategies for changing the walk strategy have been proposed, starting from injecting edge weights to biasing the walks towards higher or lower degree nodes, and changing the structure of the extracted walk completely. Moreover, also the embedding creation itself has been analyzed in the past by using different variants of the word2vec word embedding method. In this chapter, we introduce a few of those approaches and highlight their advantages and shortcomings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 37.44; Price includes VAT (Germany)

Softcover Book: EUR 48.14; Price includes VAT (Germany)

Hardcover Book: EUR 48.14; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://pyrdf2vec.readthedocs.io/en/latest/api/pyrdf2vec.samplers.html.
2.
https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream.
3.
The figure also shows non-Wikipedia pages as click sources. Those are ignored when computing the edge weights.
4.
The paper reports on a preliminary study and therefore uses only the cities, movies, and albums classification and regression tasks. Therefore, the results are not directly comparable to the results in the previous section.
5.
Since neither jRDF2vec nor pyRDF2vec allows for incorporating external weights, the authors of the study have used their own proprietary implementation of RDF2vec for the study. Therefore, no code example is given here. The implementation used for the experiments can be found at https://github.com/ataweel55/RDF2VEC.
6.
Please note that both examples are strongly simplified for the sake of illustration. In practice, the representation of a word or an entity is learned from a multitude of sentences or walks, rather than a single sentence or walk.
7.
In this task, it is important to return an entity of the right class, e.g., for solving Berlin is to Germany as Paris is to ?, the result must be from the class Country.
8.
The snipped assumes that the jRDF2vec JAR has been downloaded and placed in the same directory. It is further assumed that the compiled wang2vec project has been placed in the same directory. For an extensive user guide with examples, we refer the reader to the GitHub repository: https://github.com/dwslab/jRDF2Vec.
9.
It is important to note that although they can be derived from standard walks, it is usually much faster to generate those walks directly instead of generating standard walks first and then deriving the e-walks and p-walks.
10.
That latter entity is unrelated to Mannheim, however, in the DBpedia graph, one of the few statements about this entity is that it is different from Peter Kurz, who, in turn, is related to Mannheim. This leads to a large fraction of multi-hop walks starting in the entity Peter Kurze containing the entity Mannheim and other entities related to Mannheim, making it ultimately ending up close to Mannheim in the vector space. This anecdotic example shows that explicit negative information (here: an entity not being related to another entity) is not very well picked up by RDF2vec, and even has a contrary effect.
11.
Note that the exact syntax of the code might change once this becomes an official feature.
12.
The corresponding walk type option for p-walks would be EXPERIMENTAL_MID_EDGE_WALKS_DUPLICATE_FREE.
13.
Caveat: you may not directly compare the accuracies of jRDF2vec and pyRDF2vec, because smaller differences may also be explained by subtly different implementations, and/or different random seeds.
14.
Note while this holds in theory, real-world knowledge graphs often pose practical scalability challenges to existing reasoners (Heist and Paulheim 2021).
15.
https://www.wikidata.org/wiki/Q21510862.
16.
https://www.wikidata.org/wiki/Q18647515.
17.
https://www.wikidata.org/wiki/Property:P1696.
18.
See Chap. 5.
19.
Materialization is usually done externally as a preprocessing step and hence not included in the table.

References

Abburu S (2012) A survey on ontology reasoners and comparison. Int J Comput Appl 57(17)
Google Scholar
Al Taweel A, Paulheim H (2020) Towards exploiting implicit human feedback for improving rdf2vec embeddings. In: CEUR workshop proceedings, RWTH, vol 2635, pp 1–10
Google Scholar
Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: classification, clustering and extraction techniques. ar**v:1707.02919
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exper 10:P10008
Article MATH Google Scholar
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26
Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117
Article Google Scholar
Cochez M, Ristoski P, Ponzetto SP, Paulheim H (2017) Biased graph walks for rdf graph embeddings. In: Proceedings of the 7th international conference on web intelligence, mining and semantics, pp 1–12
Google Scholar
Comrie B (1989) Language universals and linguistic typology: syntax and morphology. University of Chicago Press
Google Scholar
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Article MathSciNet Google Scholar
Gangemi A, Guarino N, Masolo C, Oltramari A (2003) Sweetening wordnet with dolce. AI Mag 24(3):13–13
MATH Google Scholar
Gangemi A, Mika P (2003) Understanding the semantic web through descriptions and situations. In: OTM confederated international conferences “On the move to meaningful internet systems”. Springer, pp 689–706
Google Scholar
Heist N, Paulheim H (2021) The caligraph ontology as a challenge for owl reasoners. In: SemREC 2021: semantic reasoning evaluation challenge 2021, pp 21–31
Google Scholar
Iana A, Paulheim H (2020) More is not always better: the negative impact of a-box materialization on rdf2vec knowledge graph embeddings. In: CIKM (Workshops)
Google Scholar
Ivanov S, Burnaev E (2018) Anonymous walk embeddings. ar**v:1805.11921
Lehmann J (2009) Dl-learner: learning concepts in description logics. J Mach Learn Res 10:2639–2642
MathSciNet MATH Google Scholar
Ling W, Dyer C, Black AW, Trancoso I (2015a) Two/too simple adaptations of word2vec for syntax problems. In: NAACL HLT 2015, ACL, pp 1299–1304
Google Scholar
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015b) Learning entity and relation embeddings for knowledge graph completion. In: Twenty-ninth AAAI conference on artificial intelligence
Google Scholar
Newman DA (2014) Missing data: five practical guidelines. Org Res Methods 17(4):372–411
Article Google Scholar
Paulheim H (2017) Knowledge graph refinement: a survey of approaches and evaluation methods. Semant Web 8(3):489–508
Article Google Scholar
Paulheim H, Gangemi A (2015) Serving dbpedia with dolce–more than just adding a cherry on top. In: International semantic web conference. Springer, pp 180–196
Google Scholar
Perozzi B, Kulkarni V, Chen H, Skiena S (2017) Don’t walk, skip! online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 258–265
Google Scholar
Portisch J, Paulheim H (2021) Putting rdf2vec in order. In: International semantic web conference, posters and demonstrations
Google Scholar
Portisch J, Paulheim H (2022) Walk this way! entity walks and property walks for rdf2vec. In: Extended semantic web conference 2022, posters and demonstrations
Google Scholar
Schlötterer J, Wehking M, Rizi FS, Granitzer M (2019) Investigating extensions to random walk based graph embedding. In: 2019 IEEE international conference on cognitive computing (ICCC), IEEE, pp 81–89
Google Scholar
Sirin E, Parsia B, Grau BC, Kalyanpur A, Katz Y (2007) Pellet: a practical owl-dl reasoner. J Web Semant 5(2):51–53
Article Google Scholar
Steenwinckel B, Vandewiele G, Bonte P, Weyns M, Paulheim H, Ristoski P, Turck FD, Ongenae F (2021) Walk extraction strategies for node embeddings with rdf2vec in knowledge graphs. In: International conference on database and expert systems applications. Springer, pp 70–80
Google Scholar
Thalhammer A, Rettinger A, (2016) PageRank on wikipedia: towards general importance scores for entities. The semantic web: ESWC 2016 satellite events. Springer International Publishing, Crete, Greece, pp 227–240
Google Scholar
van Erp M, Mendes P, Paulheim H, Ilievski F, Plu J, Rizzo G, Waitelonis J (2016) Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: 10th international conference on language resources and evaluation (LREC)
Google Scholar
Vandewiele G, Steenwinckel B, Ongenae F, De Turck F (2019) Inducing a decision tree with discriminative paths to classify entities in a knowledge graph. In: SEPDA2019, the 4th international workshop on semantics-powered data mining and analytics, pp 1–6
Google Scholar
Vrandečić D, Krötzsch M (2014) Wikidata: a free collaborative knowledge base. Commun ACM 57(10):78–85. http://dx.doi.org/10.1145/2629489
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence, vol 28
Google Scholar

Download references

Author information

Authors and Affiliations

University of Mannheim, Mannheim, Germany
Heiko Paulheim
eBay (United States), San Jose, CA, USA
Petar Ristoski
SAP SE, Walldorf, Germany
Jan Portisch

Authors

Heiko Paulheim
View author publications
You can also search for this author in PubMed Google Scholar
Petar Ristoski
View author publications
You can also search for this author in PubMed Google Scholar
Jan Portisch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heiko Paulheim .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Paulheim, H., Ristoski, P., Portisch, J. (2023). Tweaking RDF2vec. In: Embedding Knowledge Graphs with RDF2vec. Synthesis Lectures on Data, Semantics, and Knowledge. Springer, Cham. https://doi.org/10.1007/978-3-031-30387-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-30387-6_4
Published: 04 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30386-9
Online ISBN: 978-3-031-30387-6
eBook Packages: Synthesis Collection of Technology (R0)eBColl Synthesis Collection 12

Publish with us

Policies and ethics