Spark Parameter Tuning via Trial-and-Error

Petridis, Panagiotis; Gounaris, Anastasios; Torres, Jordi

doi:10.1007/978-3-319-47898-2_24

Panagiotis Petridis⁷,
Anastasios Gounaris⁷ &
Jordi Torres⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 529))

Included in the following conference series:

INNS Conference on Big Data

2507 Accesses
22 Citations

Abstract

Spark has been established as an attractive platform for big data analysis, since it manages to hide most of the complexities related to parallelism, fault tolerance and cluster setting from developers. However, this comes at the expense of having over 150 configurable parameters, the impact of which cannot be exhaustively examined due to the exponential amount of their combinations. In this work, we investigate the impact of the most important of the tunable Spark parameters on the application performance and guide developers on how to proceed to changes to the default values. We conduct a series of experiments and we offer a trial-and-error methodology for tuning parameters in arbitrary applications based on evidence from a very small number of experimental runs. We test our methodology in three case studies, where we manage to achieve speedups of more than 10 times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Thailand)

eBook: EUR 117.69; Price includes VAT (Thailand)

Softcover Book: EUR 149.99; Price excludes VAT (Thailand)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks

Article 14 October 2022

SparkBench – A Spark Performance Testing Suite

A Method to Identify Spark Important Parameters Based on Machine Learning

References

Awan, A.J., Brorsson, M., Vlassov, V., Ayguade, E.: How data volume affects spark based data analytics on a scale-up server (2015). ar**v:1507.08340
Holl, S., Zimmermann, O., Palmblad, M., Mohammed, Y., Hofmann-Apitius, M.: A new optimization phase for scientific workflow management systems. Future Gener. Comput. Syst. 36, 352–362 (2014)
Article Google Scholar
Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Data Analysis. O’Reilly Media, Sebastopol (2015)
Google Scholar
Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S., Chun, B.G.: Making sense of performance in data analytics frameworks. In: 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2015), pp. 293–307 (2015)
Google Scholar
Petridis, P., Gounaris, A., Torres, J.: Spark parameter tuning via trial-and-error. ar**v:1607.07348
Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Özcan, F.: Clash of the titans: mapreduce vs. spark for large scale data analytics. PVLDB 8(13), 2110–2121 (2015)
Google Scholar
Tous, R., Gounaris, A., Tripiana, C., Torres, J., Girona, S., Ayguadé, E., Labarta, J., Becerra, Y., Carrera, D., Valero, M.: Spark deployment and performance evaluation on the marenostrum supercomputer. In: IEEE International Conference on Big Data (Big Data), pp. 299–306 (2015)
Google Scholar
Wang, Y., Goldstone, R., Yu, W., Wang, T.: Characterization and optimization of memory-resident mapreduce on HPC systems. In: 28th International Parallel and Distributed Processing Symposium, pp. 799–808 (2014)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012 (2012)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud 2010 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Panagiotis Petridis & Anastasios Gounaris
Computer Architecture Department, Technical University of Catalonia, Barcelona, Spain
Jordi Torres

Authors

Panagiotis Petridis
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios Gounaris
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Torres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anastasios Gounaris .

Editor information

Editors and Affiliations

School of Computing and Communications, Lancaster University , Lancaster, United Kingdom
Plamen Angelov
Data Engineering Lab, Dept. of Informatics, Aristotle University of Thessaloniki , Thessaloniki, Greece
Yannis Manolopoulos
Lab of Forest Informatics (FiLAB), Democritus University of Thrace , Orestiada, Greece
Lazaros Iliadis
WPC Information Systems Faculty, Arizona State University , Tempe, Arizona, USA
Asim Roy
Electrical Engineering Dept, (ICA), Pontifical Catholic Univ of Rio de Janei , Rio de Janeiro, Rio de Janeiro, Brazil
Marley Vellasco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petridis, P., Gounaris, A., Torres, J. (2017). Spark Parameter Tuning via Trial-and-Error. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds) Advances in Big Data. INNS 2016. Advances in Intelligent Systems and Computing, vol 529. Springer, Cham. https://doi.org/10.1007/978-3-319-47898-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-47898-2_24
Published: 08 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47897-5
Online ISBN: 978-3-319-47898-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Spark Parameter Tuning via Trial-and-Error

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks

SparkBench – A Spark Performance Testing Suite

A Method to Identify Spark Important Parameters Based on Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Spark Parameter Tuning via Trial-and-Error

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks

SparkBench – A Spark Performance Testing Suite

A Method to Identify Spark Important Parameters Based on Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation