Versatile XQuery Processing in MapReduce

  • Conference paper
Advances in Databases and Information Systems (ADBIS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8133))

Abstract

The MapReduce (MR) framework has become a standard tool for performing large batch computations—usually of aggregative nature—in parallel over a cluster of commodity machines. A significant share of typical MR jobs involves standard database-style queries, where it becomes cumbersome to specify map and reduce functions from scratch. To overcome this burden, higher-level languages such as HiveQL, PigLatin, and JAQL have been proposed to allow the automatic generation of MR jobs from declarative queries. We identify two major problems of these existing solutions: (i) they introduce new query languages and implement systems from scratch for the sole purpose of expressing MR jobs; and (ii) despite solving some of the major limitations of SQL, they still lack the flexibility required by big data applications. We propose BrackitMR, an approach based on the XQuery language with extended JSON support. XQuery not only is an established query language, but also has a more expressive data model and more powerful language constructs, enabling a much greater degree of flexibility. From a system design perspective, we extend an existing single-node query processor, Brackit, adding MR as a distributed coordination layer. Such heavy reuse of the standard query processor not only provides performance, but also allows for a more elegant design which transparently integrates MR processing into a generic query engine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Afanasiev, L., Grust, T., Marx, M., Rittinger, J., Teubner, J.: An Inflationary Fixed Point Operator in XQuery. In: ICDE Conference, pp. 1504–1506. IEEE (2008)

    Google Scholar 

  2. Bächle, S.: Separating Key Concerns in Query Processing – Set Orientation, Physical Data Independence, and Parallelism. Ph.D. thesis, University of Kaiserslautern, Germany (2012)

    Google Scholar 

  3. Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M.Y., Kanne, C.C., Özcan, F., Shekita, E.J.: Jaql: A Scripting Language for Large-Scale Semistructured Data Analysis. PVLDB 4(12), 1272–1283 (2011)

    Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: A Flexible Data Processing Tool. Commun. ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  5. Graefe, G.: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2), 73–170 (1993)

    Article  Google Scholar 

  6. Lämmel, R.: Google’s MapReduce Programming Model – Revisited. Sci. Comput. Program. 70(1), 1–30 (2008)

    Article  MATH  Google Scholar 

  7. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A Not-So-Foreign Language for Data Processing. In: SIGMOD Conference, pp. 1099–1110 (2008)

    Google Scholar 

  8. Robie, J., Brantner, M., Florescu, D., Fourny, G., Westmann, T.: JSONiq: XQuery for JSON, JSON for XQuery, pp. 63–72 (2012)

    Google Scholar 

  9. Sauer, C., Härder, T.: Compilation of Query Languages into MapReduce. Datenbank-Spektrum 13(1), 5–15 (2013)

    Article  Google Scholar 

  10. Stewart, R.J., Trinder, P.W., Loidl, H.-W.: Comparing High Level MapReduce Query Languages. In: Temam, O., Yew, P.-C., Zang, B. (eds.) APPT 2011. LNCS, vol. 6965, pp. 58–72. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive – A Petabyte Scale Data Warehouse using Hadoop. In: ICDE Conference, pp. 996–1005 (2010)

    Google Scholar 

  12. W3C: XQuery 3.0: An XML Query Language (2011), http://www.w3.org/TR/xquery-30/

  13. White, T.: Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale, 2nd edn. O’Reilly (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sauer, C., Bächle, S., Härder, T. (2013). Versatile XQuery Processing in MapReduce. In: Catania, B., Guerrini, G., Pokorný, J. (eds) Advances in Databases and Information Systems. ADBIS 2013. Lecture Notes in Computer Science, vol 8133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40683-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40683-6_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40682-9

  • Online ISBN: 978-3-642-40683-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation