Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores

Magnussen, Birk Martin; Kawasumi, Tohma; Mikami, Hiroki; Kimura, Keiji; Kasahara, Hironori

doi:10.1007/978-3-030-99372-6_4

Birk Martin Magnussen ORCID: orcid.org/0000-0003-2429-9994¹⁰,
Tohma Kawasumi¹⁰,
Hiroki Mikami¹⁰,
Keiji Kimura¹⁰ &
…
Hironori Kasahara¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13181))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

440 Accesses
1 Citations

Abstract

With an increasing number of shared memory multicore processor architectures, there is a requirement for supporting multiple architectures in automatic parallelizing compilers. The OSCAR (Optimally Scheduled Advanced Multiprocessor) automatic parallelizing compiler is able to parallelize many different sequential programs, such as scientific applications, embedded real-time applications, multimedia applications, and more. OSCAR compiler’s features include coarse-grain task parallelization with earliest execution condition analysis, analyzing both data and control dependencies, data locality optimizations over different loop nests with data dependencies, and the ability to generate parallelized code using the OSCAR API 2.1. The OSCAR API 2.1 is compatible with OpenMP for SMP multicores, with additional directives for power control and supporting heterogeneous multicores. This allows for a C or Fortran compiler with OpenMP support to generate parallel machine code for the target multicore. Additionally, using the OSCAR API analyzer allows a sequential-only compiler without OpenMP support to generate machine code for each core separately, which is then linked to one parallel application. Overall, only little configuration changes to the OSCAR compiler are needed to run and optimize OSCAR compiler-generated code on a specific platform. This paper evaluates the performance of OSCAR compiler-generated code on different modern SMP multicore processors, including Intel and AMD x86 processors, an Arm processor, and a RISC-V processor using scientific and multimedia benchmarks in C and Fortran. The results show promising speedups on all platforms, such as a speedup of 7.16 for the swim program of the SPEC2000 benchmarks on an 8-core Intel x86 processor, a speedup of 9.50 for the CG program of the NAS parallel benchmarks on 8 cores of an AMD x86 Processor, a speedup of 3.70 for the BT program of the NAS parallel benchmarks on a 4-core RISC-V processor, and a speedup of 2.64 for the equake program of the SPEC2000 benchmarks on 4 cores of an Arm processor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Design and Preliminary Evaluation of Omni OpenACC Compiler for Massive MIMD Processor PEZY-SC

Source-to-source compilation targeting OpenMP-based automatic parallelization of C applications

Article 17 December 2019

References

Adhi, B.A., Kashimata, T., Takahashi, K., Kimura, K., Kasahara, H.: Compiler software coherent control for embedded high performance multicore. IEICE Trans. Electron. E103.C(3), 85–97 (2020). https://doi.org/10.1587/transele.2019LHP0008
Article Google Scholar
Advanced Micro Devices Inc.: Software Optimization Guide for AMD Family 17h Processors (2017)
Google Scholar
Advanced Micro Devices Inc.: Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors (2020)
Google Scholar
Arm Limited: Arm® Architecture Reference Manual, Armv8, for Armv8-A architecture profile (2021)
Google Scholar
Bailey, D.H., et al.: The NAS parallel benchmarks summary and preliminary results. In: Supercomputing 1991: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, pp. 158–165 (1991). https://doi.org/10.1145/125826.125925
Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T.: Parallel programming with Polaris. Computer 29(12), 78–82 (1996). https://doi.org/10.1109/2.546612
Article Google Scholar
Fritts, J.E., Steiling, F.W., Tucek, J.A., Wolf, W.: MediaBench II video: expediting the next generation of video systems research. Microprocess. Microsyst. 33(4), 301–318 (2009). https://doi.org/10.1016/j.micpro.2009.02.010
Article Google Scholar
Hall, M., et al.: Maximizing multiprocessor performance with the SUIF compiler. Computer 29(12), 84–89 (1996). https://doi.org/10.1109/2.546613
Article Google Scholar
Henning, J.L.: SPEC CPU2000: measuring CPU performance in the new millennium. Computer 33(7), 28–35 (2000). https://doi.org/10.1109/2.869367
Article Google Scholar
Honda, H., Kasahara, H.: Coarse grain parallelism detection scheme of a Fortran program. Syst. Comput. Jpn. 22(12), 24–36 (1991). https://doi.org/10.1002/scj.4690221203
Article Google Scholar
Intel Corp.: Intel® 64 and IA-32 Architectures Software Developer’s Manual (2021)
Google Scholar
Ishizaka, K., Miyamoto, T., Shirako, J., Obata, M., Kimura, K., Kasahara, H.: Performance of OSCAR multigrain parallelizing compiler on SMP servers. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 319–331. Springer, Heidelberg (2005). https://doi.org/10.1007/11532378_23
Chapter Google Scholar
Ishizaka, K., Obata, M., Kasahara, H.: Coarse grain task parallel processing with cache optimization on shared memory multiprocessor. In: Dietz, H.G. (ed.) LCPC 2001. LNCS, vol. 2624, pp. 352–365. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-35767-X_23
Chapter MATH Google Scholar
Kimura, K., et al.: Multigrain parallel processing on compiler cooperative chip multiprocessor. In: 9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT 2005), pp. 11–20 (2005). https://doi.org/10.1109/INTERACT.2005.9
Kimura, K., et al.: OSCAR API v2.1: extensions for an advanced accelerator control scheme to a low-power multicore API. In: 17th Workshop on Compilers for Parallel Computing (2013)
Google Scholar
Kimura, K., Mase, M., Mikami, H., Miyamoto, T., Shirako, J., Kasahara, H.: OSCAR API for real-time low-power multicores and its performance on multicores and SMP servers. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 188–202. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13374-9_13
Chapter Google Scholar
Mase, M., Onozaki, Y., Kimura, K., Kasahara, H.: Parallelizable c and its performance on low power high performance multicore processors (2010)
Google Scholar
NVIDIA Corp.: NVIDIA Jetson Xavier NX System-on-Module Data Sheet (2020)
Google Scholar
Obata, M., Shirako, J., Kaminaga, H., Ishizaka, K., Kasahara, H.: Hierarchical parallelism control for multigrain parallel processing. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 31–44. Springer, Heidelberg (2005). https://doi.org/10.1007/11596110_3
Chapter Google Scholar
Real world computing project: Omni OpenMP Compiler Project. http://www.hpcs.cs.tsukuba.ac.jp/omni-compiler/. Accessed 18 July 2021
SiFive Inc.: SiFive FU740-C000 Manual (2021)
Google Scholar
Yoshida, A., Koshizuka, K., Kasahara, H.: Data-localization for Fortran macro-dataflow computation using partial static task assignment. In: Proceedings of the 10th International Conference on Supercomputing, ICS 1996, pp. 61–68. Association for Computing Machinery, New York (1996). https://doi.org/10.1145/237578.237586

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Waseda University, Green Computing Center, 27 Waseda-machi, Shinjuku-ku, Tokyo, 162-0042, Japan
Birk Martin Magnussen, Tohma Kawasumi, Hiroki Mikami, Keiji Kimura & Hironori Kasahara

Authors

Birk Martin Magnussen
View author publications
You can also search for this author in PubMed Google Scholar
Tohma Kawasumi
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Mikami
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Kasahara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Birk Martin Magnussen .

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Delaware, Newark, DE, USA
**aoming Li
Department of Computer Science, University of Delaware, Newark, DE, USA
Sunita Chandrasekaran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Magnussen, B.M., Kawasumi, T., Mikami, H., Kimura, K., Kasahara, H. (2022). Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores. In: Li, X., Chandrasekaran, S. (eds) Languages and Compilers for Parallel Computing. LCPC 2021. Lecture Notes in Computer Science, vol 13181. Springer, Cham. https://doi.org/10.1007/978-3-030-99372-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-99372-6_4
Published: 24 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99371-9
Online ISBN: 978-3-030-99372-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Design and Preliminary Evaluation of Omni OpenACC Compiler for Massive MIMD Processor PEZY-SC

Source-to-source compilation targeting OpenMP-based automatic parallelization of C applications

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

ComPar: Optimized Multi-compiler for Automatic OpenMP S2S Parallelization

Design and Preliminary Evaluation of Omni OpenACC Compiler for Massive MIMD Processor PEZY-SC

Source-to-source compilation targeting OpenMP-based automatic parallelization of C applications

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation