Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2021)

Abstract

With an increasing number of shared memory multicore processor architectures, there is a requirement for supporting multiple architectures in automatic parallelizing compilers. The OSCAR (Optimally Scheduled Advanced Multiprocessor) automatic parallelizing compiler is able to parallelize many different sequential programs, such as scientific applications, embedded real-time applications, multimedia applications, and more. OSCAR compiler’s features include coarse-grain task parallelization with earliest execution condition analysis, analyzing both data and control dependencies, data locality optimizations over different loop nests with data dependencies, and the ability to generate parallelized code using the OSCAR API 2.1. The OSCAR API 2.1 is compatible with OpenMP for SMP multicores, with additional directives for power control and supporting heterogeneous multicores. This allows for a C or Fortran compiler with OpenMP support to generate parallel machine code for the target multicore. Additionally, using the OSCAR API analyzer allows a sequential-only compiler without OpenMP support to generate machine code for each core separately, which is then linked to one parallel application. Overall, only little configuration changes to the OSCAR compiler are needed to run and optimize OSCAR compiler-generated code on a specific platform. This paper evaluates the performance of OSCAR compiler-generated code on different modern SMP multicore processors, including Intel and AMD x86 processors, an Arm processor, and a RISC-V processor using scientific and multimedia benchmarks in C and Fortran. The results show promising speedups on all platforms, such as a speedup of 7.16 for the swim program of the SPEC2000 benchmarks on an 8-core Intel x86 processor, a speedup of 9.50 for the CG program of the NAS parallel benchmarks on 8 cores of an AMD x86 Processor, a speedup of 3.70 for the BT program of the NAS parallel benchmarks on a 4-core RISC-V processor, and a speedup of 2.64 for the equake program of the SPEC2000 benchmarks on 4 cores of an Arm processor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Adhi, B.A., Kashimata, T., Takahashi, K., Kimura, K., Kasahara, H.: Compiler software coherent control for embedded high performance multicore. IEICE Trans. Electron. E103.C(3), 85–97 (2020). https://doi.org/10.1587/transele.2019LHP0008

    Article  Google Scholar 

  2. Advanced Micro Devices Inc.: Software Optimization Guide for AMD Family 17h Processors (2017)

    Google Scholar 

  3. Advanced Micro Devices Inc.: Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors (2020)

    Google Scholar 

  4. Arm Limited: Arm® Architecture Reference Manual, Armv8, for Armv8-A architecture profile (2021)

    Google Scholar 

  5. Bailey, D.H., et al.: The NAS parallel benchmarks summary and preliminary results. In: Supercomputing 1991: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, pp. 158–165 (1991). https://doi.org/10.1145/125826.125925

  6. Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T.: Parallel programming with Polaris. Computer 29(12), 78–82 (1996). https://doi.org/10.1109/2.546612

    Article  Google Scholar 

  7. Fritts, J.E., Steiling, F.W., Tucek, J.A., Wolf, W.: MediaBench II video: expediting the next generation of video systems research. Microprocess. Microsyst. 33(4), 301–318 (2009). https://doi.org/10.1016/j.micpro.2009.02.010

    Article  Google Scholar 

  8. Hall, M., et al.: Maximizing multiprocessor performance with the SUIF compiler. Computer 29(12), 84–89 (1996). https://doi.org/10.1109/2.546613

    Article  Google Scholar 

  9. Henning, J.L.: SPEC CPU2000: measuring CPU performance in the new millennium. Computer 33(7), 28–35 (2000). https://doi.org/10.1109/2.869367

    Article  Google Scholar 

  10. Honda, H., Kasahara, H.: Coarse grain parallelism detection scheme of a Fortran program. Syst. Comput. Jpn. 22(12), 24–36 (1991). https://doi.org/10.1002/scj.4690221203

    Article  Google Scholar 

  11. Intel Corp.: Intel® 64 and IA-32 Architectures Software Developer’s Manual (2021)

    Google Scholar 

  12. Ishizaka, K., Miyamoto, T., Shirako, J., Obata, M., Kimura, K., Kasahara, H.: Performance of OSCAR multigrain parallelizing compiler on SMP servers. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 319–331. Springer, Heidelberg (2005). https://doi.org/10.1007/11532378_23

    Chapter  Google Scholar 

  13. Ishizaka, K., Obata, M., Kasahara, H.: Coarse grain task parallel processing with cache optimization on shared memory multiprocessor. In: Dietz, H.G. (ed.) LCPC 2001. LNCS, vol. 2624, pp. 352–365. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-35767-X_23

    Chapter  MATH  Google Scholar 

  14. Kimura, K., et al.: Multigrain parallel processing on compiler cooperative chip multiprocessor. In: 9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT 2005), pp. 11–20 (2005). https://doi.org/10.1109/INTERACT.2005.9

  15. Kimura, K., et al.: OSCAR API v2.1: extensions for an advanced accelerator control scheme to a low-power multicore API. In: 17th Workshop on Compilers for Parallel Computing (2013)

    Google Scholar 

  16. Kimura, K., Mase, M., Mikami, H., Miyamoto, T., Shirako, J., Kasahara, H.: OSCAR API for real-time low-power multicores and its performance on multicores and SMP servers. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 188–202. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13374-9_13

    Chapter  Google Scholar 

  17. Mase, M., Onozaki, Y., Kimura, K., Kasahara, H.: Parallelizable c and its performance on low power high performance multicore processors (2010)

    Google Scholar 

  18. NVIDIA Corp.: NVIDIA Jetson Xavier NX System-on-Module Data Sheet (2020)

    Google Scholar 

  19. Obata, M., Shirako, J., Kaminaga, H., Ishizaka, K., Kasahara, H.: Hierarchical parallelism control for multigrain parallel processing. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 31–44. Springer, Heidelberg (2005). https://doi.org/10.1007/11596110_3

    Chapter  Google Scholar 

  20. Real world computing project: Omni OpenMP Compiler Project. http://www.hpcs.cs.tsukuba.ac.jp/omni-compiler/. Accessed 18 July 2021

  21. SiFive Inc.: SiFive FU740-C000 Manual (2021)

    Google Scholar 

  22. Yoshida, A., Koshizuka, K., Kasahara, H.: Data-localization for Fortran macro-dataflow computation using partial static task assignment. In: Proceedings of the 10th International Conference on Supercomputing, ICS 1996, pp. 61–68. Association for Computing Machinery, New York (1996). https://doi.org/10.1145/237578.237586

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Birk Martin Magnussen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Magnussen, B.M., Kawasumi, T., Mikami, H., Kimura, K., Kasahara, H. (2022). Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores. In: Li, X., Chandrasekaran, S. (eds) Languages and Compilers for Parallel Computing. LCPC 2021. Lecture Notes in Computer Science, vol 13181. Springer, Cham. https://doi.org/10.1007/978-3-030-99372-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99372-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99371-9

  • Online ISBN: 978-3-030-99372-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation