Scalability analysis of AVX-512 extensions

Cebrian, Juan M.; Natvig, Lasse; Jahre, Magnus

doi:10.1007/s11227-019-02840-7

Scalability analysis of AVX-512 extensions

Published: 23 April 2019

Volume 76, pages 2082–2097, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

957 Accesses
15 Altmetric
Explore all metrics

Abstract

Energy efficiency below a specific thermal design power (TDP) has become the main design goal for microprocessors across all market segments. Optimizing the usage of the available transistors within the TDP is a pending topic. Parallelism is the basic foundation for achieving the exascale level. While instruction-level and thread-level parallelism are embraced by developers, data-level parallelism is usually underutilized, despite its huge potential (e.g. single-instruction multiple-data execution). Companies are pushing the size of vector registers to double every 4 years. Intel’s AVX-512 (512-bit registers) and ARM’s SVE (up to 2048-bit registers) are examples of such trend. In this paper, we perform a scalability and energy efficiency analysis of AVX-512 using the ParVec benchmark suite. ParVec is extended to add support for AVX-512 as well as the newest versions of the GCC compiler . We use Intel’s Top–Down model to show the main bottlenecks of the architecture for each studied benchmark. Results show that the performance and energy improvements depend greatly on the fraction of code that can be vectorized . Energy improvements over scalar codes in a single-thread environment range from 2\(\times \) for Streamcluster (worst) to 8\(\times \) for Blackscholes (best).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Pipelined FFT Hardware Architectures

Article Open access 06 July 2021

thSORT: an efficient parallel sorting algorithm on multi-core DSPs

Article 19 January 2024

GPU Architecture

Notes

Micro-operations.
Set of macros used to generate the intrinsics code using the C pre-processor.
Model Specific Registers.

References

Asanovi\(\grave{{\rm c}}\) K (1998) Vector microprocessors. Ph.D. thesis
Barnes GH, Brown RM, Kato M, Kuck DJ, Slotnick DL, Stokes RA (1968) The ILLIAC IV computer. IEEE Trans Comput C–17(8):746–757
Article Google Scholar
Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University
Borkar S, Chien AA (2011) The future of microprocessors. ACM, New York, NY, USA. https://doi.org/10.1145/1941487.1941507
Cebrian JM, Jahre M, Natvig L (2015) Parvec: vectorizing the parsec benchmark suite. Computing 97:1077–1100
Article MathSciNet Google Scholar
Cebrian JM, Natvig L, ParVec Git repository. https://github.com/magnusjahre/parvec. Accessed Apr 2019
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization. IEEE, pp 44–54. http://doi.ieeecomputersociety.org/10.1109/IISWC.2009.5306797
Cray Research I (1984) Cray X-MP series. http://s3data.computerhistory.org/brochures/cray.x-mp.1983.102646267.pdf
Dennard R, Gaensslen F, Rideout V, Bassous E, LeBlanc A (1974) Design of ion-implanted mosfet’s with very small physical dimensions. https://doi.org/10.1109/JSSC.1974.1050511
Espasa R, Valero M, Smith JE (1998) Vector architectures : past, present and future. In: Proceeding ICS ’98 Proceedings of the 12th International Conference on Supercomputing, pp 425–432
Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Fuller S (1998) Motorola AltiVec technology. Motorola, Austin
Google Scholar
Hennessy JL, Patterson DA (2006) Computer architecture, fourth edition: a quantitative approach. Morgan Kaufmann Publishers Inc., San Francisco
MATH Google Scholar
Intel Corporation (2016a) Intel 64 and IA-32 architectures software developer’s manual volume 1: basic architecture. https://www.intel.es/content/www/es/es/architecture-and-technology/64-ia-32-architectures-software-developer-vol-1-manual.html
Intel Corporation (2016b) Intel 64 and IA-32 architectures software developer’s manual volume 2A: instruction set reference. https://www.intel.la/content/www/xl/es/architecture-and-technology/64-ia-32-architectures-softwaredeveloper-vol-2a-manual.html
ITRS (2012) International technology roadmap for semiconductors report. https://www.itrs2.net/2012-itrs.html. Accssed Apr 2019
Li M, Sasanka R, Adve S.V, kuang Chen Y, Debes E (2005) The alpbench benchmark suite. In: In Proceedings of the IEEE International Symposium on Workload Characterization
Molka D, Hackenberg D, Schöne R, Minartz T, Nagel W (2011) Flexible workload generation for HPC cluster efficiency benchmarking. Springer, Berlin. https://doi.org/10.1007/s00450-011-0194-9
Book Google Scholar
Mucci PJ, Browne S, Deane C, Ho G (1999) PAPI: a portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference
NEC (2017) Vector supercomputer SX series: SX-Aurora TSUBASA. https://www.nec.com/en/event/mwc2019/leaflet/pdf_2019/SX_Aurora_eng.pdf
NEON Programmer's Guide - Arm (2013). https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf
Ren B, Jo Y, Krishnamoorthy S, Agrawal K, Kulkarni M (2015) Efficient execution of recursive programs on commodity vector hardware. In: ACM SIGPLAN notices, vol 50. ACM, pp 509–520
Technology Manual (2000). https://www.amd.com/system/files/TechDocs/21928.pdf
Russell RM (1978) The CRAY-1 computer system. Commun. ACM 21(1):63–72. https://doi.org/10.1145/359327.359336
Article Google Scholar
SLEEF Vectorized Math Library. https://sleef.org/
Satish N, Kim C, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Can traditional programming bridge the ninja performance gap for parallel computing applications? In: Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA), pp 440–451
Sodani A (2015) Knights landing (KNL): 2nd generation Intell® Xeon Phi processor. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp 1–24. https://doi.org/10.1109/HOTCHIPS.2015.7477467
Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N, Reid A, Rico A, Walker P (2017) The ARM scalable vector extension. IEEE Micro 37(2):26–39
Article Google Scholar
Watson WJ (1972) The TI ASC: a highly modular and flexible super computer architecture. In: Proceedings of the December 5–7, 1972, Fall Joint Computer Conference, Part I (AFIPS), pp 221–228
Yasin A (2014) A Top-Down method for performance analysis and counters architecture. ISPASS 2014—IEEE International Symposium on Performance Analysis of Systems and Software, pp 35–44. https://doi.org/10.1109/ISPASS.2014.6844459
Yoshida T (2016) Fujitsu Presentation Theme: Introduction of Fujitsu's HPC Processor for the Post-K Computer Speaker: Toshio Yoshida. https://www.fujitsu.com/global/documents/solutions/business-technology/tc/catalog/20160822hotchips28.pdf

Download references

Author information

Authors and Affiliations

University of Murcia, Murcia, Spain
Juan M. Cebrian
Norwegian University of Science and Technology (NTNU), Trondheim, Norway
Lasse Natvig & Magnus Jahre

Authors

Juan M. Cebrian
View author publications
You can also search for this author in PubMed Google Scholar
Lasse Natvig
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Jahre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan M. Cebrian.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cebrian, J.M., Natvig, L. & Jahre, M. Scalability analysis of AVX-512 extensions. J Supercomput 76, 2082–2097 (2020). https://doi.org/10.1007/s11227-019-02840-7

Download citation

Published: 23 April 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11227-019-02840-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalability analysis of AVX-512 extensions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Pipelined FFT Hardware Architectures

thSORT: an efficient parallel sorting algorithm on multi-core DSPs

GPU Architecture

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Scalability analysis of AVX-512 extensions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Pipelined FFT Hardware Architectures

thSORT: an efficient parallel sorting algorithm on multi-core DSPs

GPU Architecture

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation