Framework-Based Scale-Out RDF Systems

Wylot, Marcin; Sakr, Sherif

doi:10.1007/978-3-319-77525-8_225

Marcin Wylot³ &
Sherif Sakr⁴

89 Accesses

Synonyms

Hadoop-based RDF query processors; Spark-based RDF query processors

Definitions

RDF, the Resource Description Framework, has been recognized as a de facto standard to describe resources in a semi-structured manner. In particular, RDF is a graph-based format which allows to define named links between resources in the form of triples subject, predicate, object, also called statements. A statement expresses a relationship (defined by a predicate) between resources (subject and object). The relationship is always from subject to object (it is directional). The same resource can be used in multiple triples playing the same or different roles, e.g., it can be used as a subject in one triple, as well as a predicate or an object in another one. This ability enables definition of multiple connections between the triples, hence creation of a connected graph of data. Such graph can be represented as nodes that stands for the resources and edges capturing the relationships between the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 849.99; Price excludes VAT (USA)

Hardcover Book: USD 999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi DJ, Marcus A, Madden SR, Hollenbach K (2007) Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd international conference on very large data bases. VLDB Endowment, pp 411–422
Google Scholar
Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A (2009) Hadoopdb: an architectural hybrid of mapreduce and DBMS technologies for analytical workloads. PVLDB 2(1):922–933. http://www.vldb.org/pvldb/2/vldb09-861.pdf
Article Google Scholar
Armbrust M, **n RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M (2015) Spark SQL: relational data processing in spark. In: SIGMOD. https://doi.org/10.1145/2723372.2742797
Bernstein PA, Chiu DMW (1981) Using semi-joins to solve relational queries. J ACM (JACM) 28(1): 25–40
Article MATH Google Scholar
Chen X, Chen H, Zhang N, Zhang S (2014) SparkRDF: elastic discreted RDF graph processing engine with distributed memory. In: Proceedings of the ISWC 2014 posters & demonstrations track a track within the 13th international semantic web conference, ISWC 2014, Riva del Garda, 21 Oct 2014, pp 261–264. http://ceur-ws.org/Vol-1272/paper_43.pdf
Chen X, Chen H, Zhang N, Zhang S (2015) SparkRDF: elastic discreted RDF graph processing engine with distributed memory. In: IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, WI-IAT 2015, Singapore, 6–9 Dec 2015, vol I, pp 292–300. https://doi.org/10.1109/WI-IAT.2015.186
Dean J, Ghemawa S (2004) MapReduce: simplified data processing on large clusters. In: OSDI
Google Scholar
Djahandideh B, Goasdoué F, Kaoudi Z, Manolescu I, Quiané-Ruiz J, Zampetakis S (2015) Cliquesquare in action: flat plans for massively parallel RDF queries. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, 13–17 Apr 2015, pp 1432–1435. https://doi.org/10.1109/ICDE.2015.7113394
Goasdoué F, Kaoudi Z, Manolescu I, Quiané-Ruiz J, Zampetakis S (2015) Cliquesquare: flat plans for massively parallel RDF queries. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, 13–17 Apr 2015, pp 771–782. https://doi.org/10.1109/ICDE.2015.7113332
Gonzalez JE, **n RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) GraphX: graph processing in a distributed dataflow framework. In: OSDI. https:// www.usenix.org/conference/osdi14/technical-sessions/ presentation/gonzalez
Goodman EL, Grunwald D (2014) Using vertex-centric programming platforms to implement SPARQL queries on large graphs. In: Proceedings of the 4th workshop on irregular applications: architectures and algorithms, IA3 ’14. IEEE Press, Piscataway, pp 25–32. https://doi.org/10.1109/IA3.2014.10
Google Scholar
Huang J, Abadi DJ, Ren K (2011a) Scalable SPARQL querying of large RDF graphs. PVLDB 4(11): 1123–1134
Google Scholar
Huang J, Abadi DJ, Ren K (2011b) Scalable SPARQL querying of large RDF graphs. Proc VLDB Endow 4(11):1123–1134
Google Scholar
Husain M, McGlothlin J, Masud MM, Khan L, Thuraisingham BM (2011) Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Trans Knowl Data Eng 23(9): 1312–1327
Article Google Scholar
Kim H, Ravindra P, Anyanwu K (2013) Optimizing RDF(S) queries on cloud platforms. In: 22nd international world wide web conference, WWW ’13, Rio de Janeiro, 13–17 May 2013, Companion volume, pp 261–264. http://dl.acm.org/citation.cfm?id=2487917
Lee K, Liu L (2013) Scaling queries over big RDF graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905
Article Google Scholar
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphLab: a framework for machine learning in the cloud. PVLDB 5(8):716–727
Google Scholar
Naacke H, Curé O, Amann B (2016) SPARQL query processing with Apache spark. CoRR abs/1604.08903. http://arxiv.org/abs/1604.08903
Neumann T, Weikum G (2010) The RDF-3x engine for scalable management of RDF data. VLDB J 19(1): 91–113
Article Google Scholar
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2008, Vancouver, 10–12 June 2008, pp 1099–1110. https://doi.org/10.1145/1376616.1376726
Poggi A, Lembo D, Calvanese D, De Giacomo G, Lenzerini M, Rosati R (2008) Linking data to ontologies. In: Spaccapietra S (ed) Journal on data semantics X. Springer, Berlin/Heidelberg, pp 133–173
Chapter MATH Google Scholar
Ravindra P, Kim H, Anyanwu K (2011) An intermediate algebra for optimizing RDF graph pattern matching on mapreduce. In: The semanic web: research and applications – 8th extended semantic web conference, ESWC 2011, Heraklion, Crete, 29 May – 2 June 2011, Proceedings, Part II, pp 46–61. https://doi.org/10.1007/978-3-642-21064-8_4
Chapter Google Scholar
Rohloff K, Schantz RE (2010) High-performance, massively scalable distributed systems using the mapreduce software framework: the shard triple-store. In: Programming support innovations for emerging distributed applications. ACM, p 4
Google Scholar
Sakr S (2016) Big data 2.0 processing systems – a survey. Springer briefs in computer science. Springer. https://doi.org/10.1007/978-3-319-38776-5
Book Google Scholar
Sakr S, Liu A, Fayoumi AG (2013) The family of mapreduce and large-scale data processing systems. ACM Comput Surv 46(1). https://doi.org/10.1145/2522968.2522979
Article Google Scholar
Schätzle A, Przyjaciel-Zablocki M, Hornung T, Lausen G (2013) Pigsparql: a SPARQL query processing baseline for big data. In: Proceedings of the ISWC 2013 posters & demonstrations track, Sydney, 23 Oct 2013, pp 241–244. http://ceur-ws.org/Vol-1035/iswc2013_poster_16.pdf
Schätzle A, Przyjaciel-Zablocki M, Berberich T, Lausen G (2015a) S2X: graph-parallel querying of RDF with GraphX. In: 1st international workshop on big-graphs online querying (Big-O(Q))
Google Scholar
Schätzle A, Przyjaciel-Zablocki M, Skilevic S, Lausen G (2015b) S2RDF: RDF querying with SPARQL on spark. CoRR abs/1512.07021. http://arxiv.org/abs/1512.07021
Valduriez P (1987) Join indices. ACM Trans Database Syst 12(2):218–246. https://doi.org/10.1145/22952.22955
Article Google Scholar
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: HotCloud
Google Scholar

Download references

Author information

Authors and Affiliations

ODS, TU Berlin, Berlin, Germany
Marcin Wylot
Institute of Computer Science, University of Tartu, Tartu, Estonia
Sherif Sakr

Authors

Marcin Wylot
View author publications
You can also search for this author in PubMed Google Scholar
Sherif Sakr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Marcin Wylot or Sherif Sakr .

Editor information

Editors and Affiliations

Institute of Computer Science, University of Tartu, Tartu, Estonia
Sherif Sakr
School of Information Technologies, Sydney University, Sydney, Australia
Albert Y. Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Wylot, M., Sakr, S. (2019). Framework-Based Scale-Out RDF Systems. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_225

Download citation

DOI: https://doi.org/10.1007/978-3-319-77525-8_225
Published: 20 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics