Abstract
Efficient management of RDF data is an important prerequisite for realizing the Semantic Web vision. Performance and scalability issues are becoming increasingly pressing as Semantic Web technology is applied to real-world applications. In this paper, we examine the reasons why current data management solutions for RDF data scale poorly, and explore the fundamental scalability limitations of these approaches. We review the state of the art for improving performance of RDF databases and consider a recent suggestion, “property tables”. We then discuss practically and empirically why this solution has undesirable features. As an improvement, we propose an alternative solution: vertically partitioning the RDF data. We compare the performance of vertical partitioning with prior art on queries generated by a Web-based RDF browser over a large-scale (more than 50 million triples) catalog of library data. Our results show that a vertically partitioned schema achieves similar performance to the property table technique while being much simpler to design. Further, if a column-oriented DBMS (a database architected specially for the vertically partitioned case) is used instead of a row-oriented DBMS, another order of magnitude performance improvement is observed, with query times drop** from minutes to several seconds. Encouraged by these results, we describe the architecture of SW-Store, a new DBMS we are actively building that implements these techniques to achieve high performance RDF data management.
Similar content being viewed by others
References
Abadi, D., Marcus, A., Madden, S., Hollenbach, K.: Using the Barton libraries dataset as an RDF benchmark. Technical Report MIT-CSAIL-TR-2007-036, MIT Press, USA
Abadi, D.J.: Column stores for wide and sparse data. In: CIDR (2007)
Abadi, D.J.: Query execution in column-oriented database systems. PhD Dissertation, PhD Thesis, MIT Press, USA (2008)
Abadi, D.J., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD (2006)
Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: How different are they really? In: SIGMOD (2008)
Abadi, D.J., Myers, D.S., DeWitt, D.J., Madden, S.R.: Materialization strategies in a column-oriented DBMS. In: Proceedings of ICDE (2007)
Agrawal, R., Somani, A., Xu, Y.: Storage and querying of E-commerce data. In: VLDB (2001)
Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. In: VLDB, pp. 169–180 (2001)
Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D., Tolle, K.: The ICS-FORTH RDFSuite: managing voluminous RDF description bases. In: SemWeb (2001)
Bajda-Pawlikowski, K.: Querying RDF data stored in DBMS: SPARQL to SQL Conversion. Technical Report TR-1409, Yale Computer Science Department, USA
Batory D.S.: On searching transposed files. ACM Trans. Database Syst. 4(4), 531–544 (1979)
Beckmann, J., Halverson, A., Krishnamurthy, R., Naughton, J.: Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In: ICDE (2006)
Bertino E., Kim W.: Indexing techniques for queries on nested objects. IEEE Trans. Knowl. Data Eng. 1(2), 196–214 (1989)
Boncz, P., Manegold, S., Kersten, M.: Database architecture optimized for the new bottleneck: memory access. In: VLDB, pp. 54–65 (1999)
Boncz P.A., Kersten M.L.: MIL primitives for querying a fragmented world. VLDB J. 8(2), 101–119 (1999)
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR, pp. 225–237 (2005)
Bonstrom, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. In: Proceedings of LA-WEB (2003)
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: ISWC, pp. 54–68 (2002)
Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB, pp. 1216–1227 (2005)
Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: Proceedings of SIGMOD, pp. 268–279 (1985)
Corwin J., Silberschatz A., Miller P.L., Marenco L.: Dynamic tables: An architecture for managing evolving, heterogeneous biomedical data in relational database management systems. J. Am. Med. Inf. Assoc. 14(1), 86–93 (2007)
Falcons. Searching the semantic web. Web page. http://iws.seu.edu.cn/services/falcons/objectsearch/index.jsp/
Florescu D., Kossmann D.: Storing and querying XML data using an RDMBS. IEEE Data Eng. Bull. 22(3), 27–34 (1999)
Harris, S., Gibbins, N.: 3store: efficient bulk RDF storage. In: Proceedings of PSSS’03, pp. 1–15 (2003)
Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems. In: Proceedings of VLDB, pp. 562–573. Zurich (1995)
Howe, B., Maier, D., Rayner, N., Rucker, J.: Quarrying dataspaces: schemaless profiling of unfamiliar information sources. In: Proceedings of the workshop on information integration methods, architectures, and systems (IIMAS) (2008)
Kemper A., Moerkotte G.: Access support relations: an indexing method for object bases. Inf. Syst. 17(2), 117–145 (1992)
Library catalog data. http://simile.mit.edu/rdf-test-data/barton/
Longwell: http://simile.mit.edu/longwell/
Lu, J., Cao, F., Ma, L., Yu, Y., Pan, Y.: An Effective SPARQL support over relational databases. In: Proceedings of the joint ODBIS/SWDB workshop on semantic web, ontologies, and databases (2007)
Lu, J., Ma, L., Zhang, L., Brunner, J.-S., Wang, C., Pan, Y., Yu, Y.: SOR: A practical system for ontology storage, reasoning and search. In: Proceedings of VLDB, pp. 1402–1405 (2007)
Lu, J., Wang, C., Ma, L., Yu, Y., Pan, Y.: Performance and scalability evaluation of practical ontology systems. In: Proceedings of the joint ODBIS/SWDB workshop on semantic web, ontologies, and databases (2007)
MacNicol, R., French, B.: Sybase IQ multiplex—designed for analytics. In: VLDB pp. 1227–1230 (2004)
Metaweb: Freebase parallax. Web page. http://mqlx.com/~david/parallax/
Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of ICDT, pp. 277–295 (1999)
Olofson, C.: Worldwide rdbms 2005 vendor shares. Technical report 201692, IDC, USA (2006)
Redland RDF application framework. http://librdf.org/
RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer (2004)
RDQL—A Query Language for RDF. W3C Member Submission 9 January 2004. http://www.w3.org/Submission/RDQL/, 2004
Simile website. http://simile.mit.edu/
SPARQL Query Language for RDF. W3C Working Draft 4 October 2006. http://www.w3.org/TR/rdf-sparql-query/, 2006
Schmidt, M., Hornung, T., Kuchlin, N., Lausen, G., Pinkel, C.: An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario. In: Proceedings of ISWC (2008)
Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational databases for querying XML documents: Limitations and opportunities. In: Proceedings of VLDB, pp. 302–314 (1999)
Sindice. The semantic web index. http://sindice.com/
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-Store: a column-oriented DBMS. In: VLDB, pp. 553–564 (2005)
Swoogle: Semantic web search engine. http://swoogle.umbc.edu/
Theoharis, Y., Christophides, V., Karvounarakis, G.: Benchmarking database representations of RDF/S stores. In: Proceedings of ISWC (2005)
UniProt: RDF dataset. http://dev.isb-sib.ch/projects/uniprot-rdf/
Vesset, D.: Worldwide data warehousing tools 2005 vendor shares. Technical report 203229, IDC, USA (2006)
W3C SWEO Community Project: Linking open data on the semantic web. http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpen Data
World Wide Web Consortium (W3C). http://www.w3.org/
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. In: Proceedings of VLDB (2008)
Wilkinson, K.: Jena property table implementation. In: SSWS (2006)
Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB, pp. 131–150 (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abadi, D.J., Marcus, A., Madden, S.R. et al. SW-Store: a vertically partitioned DBMS for Semantic Web data management. The VLDB Journal 18, 385–406 (2009). https://doi.org/10.1007/s00778-008-0125-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-008-0125-y