Tweaking RDF2vec

  • Chapter
  • First Online:
Embedding Knowledge Graphs with RDF2vec

Abstract

Depending on the problem at hand, one might think of different tweaks to the RDF2vec algorithm, many of which have been discussed in the past. Those tweaks encompass various steps of the pipeline: reasoners have been used to preprocess the knowledge graph and add implicit knowledge. Different strategies for changing the walk strategy have been proposed, starting from injecting edge weights to biasing the walks towards higher or lower degree nodes, and changing the structure of the extracted walk completely. Moreover, also the embedding creation itself has been analyzed in the past by using different variants of the word2vec word embedding method. In this chapter, we introduce a few of those approaches and highlight their advantages and shortcomings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 37.44
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 48.14
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 48.14
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://pyrdf2vec.readthedocs.io/en/latest/api/pyrdf2vec.samplers.html.

  2. 2.

    https://meta.wikimedia.org/wiki/Research:Wikipedia_clickstream.

  3. 3.

    The figure also shows non-Wikipedia pages as click sources. Those are ignored when computing the edge weights.

  4. 4.

    The paper reports on a preliminary study and therefore uses only the cities, movies, and albums classification and regression tasks. Therefore, the results are not directly comparable to the results in the previous section.

  5. 5.

    Since neither jRDF2vec nor pyRDF2vec allows for incorporating external weights, the authors of the study have used their own proprietary implementation of RDF2vec for the study. Therefore, no code example is given here. The implementation used for the experiments can be found at https://github.com/ataweel55/RDF2VEC.

  6. 6.

    Please note that both examples are strongly simplified for the sake of illustration. In practice, the representation of a word or an entity is learned from a multitude of sentences or walks, rather than a single sentence or walk.

  7. 7.

    In this task, it is important to return an entity of the right class, e.g., for solving Berlin is to Germany as Paris is to ?, the result must be from the class Country.

  8. 8.

    The snipped assumes that the jRDF2vec JAR has been downloaded and placed in the same directory. It is further assumed that the compiled wang2vec project has been placed in the same directory. For an extensive user guide with examples, we refer the reader to the GitHub repository: https://github.com/dwslab/jRDF2Vec.

  9. 9.

    It is important to note that although they can be derived from standard walks, it is usually much faster to generate those walks directly instead of generating standard walks first and then deriving the e-walks and p-walks.

  10. 10.

    That latter entity is unrelated to Mannheim, however, in the DBpedia graph, one of the few statements about this entity is that it is different from Peter Kurz, who, in turn, is related to Mannheim. This leads to a large fraction of multi-hop walks starting in the entity Peter Kurze containing the entity Mannheim and other entities related to Mannheim, making it ultimately ending up close to Mannheim in the vector space. This anecdotic example shows that explicit negative information (here: an entity not being related to another entity) is not very well picked up by RDF2vec, and even has a contrary effect.

  11. 11.

    Note that the exact syntax of the code might change once this becomes an official feature.

  12. 12.

    The corresponding walk type option for p-walks would be EXPERIMENTAL_MID_EDGE_WALKS_DUPLICATE_FREE.

  13. 13.

    Caveat: you may not directly compare the accuracies of jRDF2vec and pyRDF2vec, because smaller differences may also be explained by subtly different implementations, and/or different random seeds.

  14. 14.

    Note while this holds in theory, real-world knowledge graphs often pose practical scalability challenges to existing reasoners (Heist and Paulheim 2021).

  15. 15.

    https://www.wikidata.org/wiki/Q21510862.

  16. 16.

    https://www.wikidata.org/wiki/Q18647515.

  17. 17.

    https://www.wikidata.org/wiki/Property:P1696.

  18. 18.

    See Chap. 5.

  19. 19.

    Materialization is usually done externally as a preprocessing step and hence not included in the table.

References

  • Abburu S (2012) A survey on ontology reasoners and comparison. Int J Comput Appl 57(17)

    Google Scholar 

  • Al Taweel A, Paulheim H (2020) Towards exploiting implicit human feedback for improving rdf2vec embeddings. In: CEUR workshop proceedings, RWTH, vol 2635, pp 1–10

    Google Scholar 

  • Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: classification, clustering and extraction techniques. ar**v:1707.02919

  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exper 10:P10008

    Article  MATH  Google Scholar 

  • Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26

    Google Scholar 

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117

    Article  Google Scholar 

  • Cochez M, Ristoski P, Ponzetto SP, Paulheim H (2017) Biased graph walks for rdf graph embeddings. In: Proceedings of the 7th international conference on web intelligence, mining and semantics, pp 1–12

    Google Scholar 

  • Comrie B (1989) Language universals and linguistic typology: syntax and morphology. University of Chicago Press

    Google Scholar 

  • Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174

    Article  MathSciNet  Google Scholar 

  • Gangemi A, Guarino N, Masolo C, Oltramari A (2003) Sweetening wordnet with dolce. AI Mag 24(3):13–13

    MATH  Google Scholar 

  • Gangemi A, Mika P (2003) Understanding the semantic web through descriptions and situations. In: OTM confederated international conferences “On the move to meaningful internet systems”. Springer, pp 689–706

    Google Scholar 

  • Heist N, Paulheim H (2021) The caligraph ontology as a challenge for owl reasoners. In: SemREC 2021: semantic reasoning evaluation challenge 2021, pp 21–31

    Google Scholar 

  • Iana A, Paulheim H (2020) More is not always better: the negative impact of a-box materialization on rdf2vec knowledge graph embeddings. In: CIKM (Workshops)

    Google Scholar 

  • Ivanov S, Burnaev E (2018) Anonymous walk embeddings. ar**v:1805.11921

  • Lehmann J (2009) Dl-learner: learning concepts in description logics. J Mach Learn Res 10:2639–2642

    MathSciNet  MATH  Google Scholar 

  • Ling W, Dyer C, Black AW, Trancoso I (2015a) Two/too simple adaptations of word2vec for syntax problems. In: NAACL HLT 2015, ACL, pp 1299–1304

    Google Scholar 

  • Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015b) Learning entity and relation embeddings for knowledge graph completion. In: Twenty-ninth AAAI conference on artificial intelligence

    Google Scholar 

  • Newman DA (2014) Missing data: five practical guidelines. Org Res Methods 17(4):372–411

    Article  Google Scholar 

  • Paulheim H (2017) Knowledge graph refinement: a survey of approaches and evaluation methods. Semant Web 8(3):489–508

    Article  Google Scholar 

  • Paulheim H, Gangemi A (2015) Serving dbpedia with dolce–more than just adding a cherry on top. In: International semantic web conference. Springer, pp 180–196

    Google Scholar 

  • Perozzi B, Kulkarni V, Chen H, Skiena S (2017) Don’t walk, skip! online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 258–265

    Google Scholar 

  • Portisch J, Paulheim H (2021) Putting rdf2vec in order. In: International semantic web conference, posters and demonstrations

    Google Scholar 

  • Portisch J, Paulheim H (2022) Walk this way! entity walks and property walks for rdf2vec. In: Extended semantic web conference 2022, posters and demonstrations

    Google Scholar 

  • Schlötterer J, Wehking M, Rizi FS, Granitzer M (2019) Investigating extensions to random walk based graph embedding. In: 2019 IEEE international conference on cognitive computing (ICCC), IEEE, pp 81–89

    Google Scholar 

  • Sirin E, Parsia B, Grau BC, Kalyanpur A, Katz Y (2007) Pellet: a practical owl-dl reasoner. J Web Semant 5(2):51–53

    Article  Google Scholar 

  • Steenwinckel B, Vandewiele G, Bonte P, Weyns M, Paulheim H, Ristoski P, Turck FD, Ongenae F (2021) Walk extraction strategies for node embeddings with rdf2vec in knowledge graphs. In: International conference on database and expert systems applications. Springer, pp 70–80

    Google Scholar 

  • Thalhammer A, Rettinger A, (2016) PageRank on wikipedia: towards general importance scores for entities. The semantic web: ESWC 2016 satellite events. Springer International Publishing, Crete, Greece, pp 227–240

    Google Scholar 

  • van Erp M, Mendes P, Paulheim H, Ilievski F, Plu J, Rizzo G, Waitelonis J (2016) Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: 10th international conference on language resources and evaluation (LREC)

    Google Scholar 

  • Vandewiele G, Steenwinckel B, Ongenae F, De Turck F (2019) Inducing a decision tree with discriminative paths to classify entities in a knowledge graph. In: SEPDA2019, the 4th international workshop on semantics-powered data mining and analytics, pp 1–6

    Google Scholar 

  • Vrandečić D, Krötzsch M (2014) Wikidata: a free collaborative knowledge base. Commun ACM 57(10):78–85. http://dx.doi.org/10.1145/2629489

  • Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence, vol 28

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heiko Paulheim .

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Paulheim, H., Ristoski, P., Portisch, J. (2023). Tweaking RDF2vec. In: Embedding Knowledge Graphs with RDF2vec. Synthesis Lectures on Data, Semantics, and Knowledge. Springer, Cham. https://doi.org/10.1007/978-3-031-30387-6_4

Download citation

Publish with us

Policies and ethics

Navigation