Skip to main content

and
  1. Article

    Open Access

    Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication

    The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded ...

    Yutaka Watanabe, Miwako Tsuji, Hitoshi Murai, Taisuke Boku in The Journal of Supercomputing (2024)

  2. No Access

    Chapter and Conference Paper

    OpenACC Unified Programming Environment for Multi-hybrid Acceleration with GPU and FPGA

    Accelerated computing in HPC such as with GPU, plays a central role in HPC nowadays. However, in some complicated applications with partially different performance behavior is hard to solve with a single type ...

    Taisuke Boku, Ryuta Tsunashima, Ryohei Kobayashi in High Performance Computing (2023)

  3. Chapter

    Multi-SPMD Programming Model with YML and XcalableMP

    This chapter describes a multi-SPMD (mSPMD) programming model and a set of software and libraries to support the mSPMD programming model. The mSPMD programming model has been proposed to realize scalable appli...

    Miwako Tsuji, Hitoshi Murai, Taisuke Boku in XcalableMP PGAS Programming Language (2021)

  4. Chapter

    XcalableMP 2.0 and Future Directions

    This chapter presents the XcalableMP on the Fugaku supercomputer, the Japanese flagship supercomputer developed by FLAGSHIP2020 project in RIKEN R-CCS. The porting and the performance evaluation were done as a...

    Mitsuhisa Sato, Hitoshi Murai, Masahiro Nakao in XcalableMP PGAS Programming Language (2021)

  5. Chapter

    Implementation and Performance Evaluation of Omni Compiler

    This chapter describes the implementation and performance evaluation of Omni compiler, which is a reference implementation of the compiler for XcalableMP. For performance evaluation, this chapter also presents...

    Masahiro Nakao, Hitoshi Murai in XcalableMP PGAS Programming Language (2021)

  6. Chapter

    XcalableACC: An Integration of XcalableMP and OpenACC

    XcalableACC (XACC) is an extension of XcalableMP for accelerated clusters. It is defined as a diagonal integration of XcalableMP and OpenACC, which is another directive-based language designed to program heter...

    Akihiro Tabuchi, Hitoshi Murai, Masahiro Nakao in XcalableMP PGAS Programming Language (2021)

  7. Chapter

    XcalableMP Programming Model and Language

    XcalableMP (XMP) is a directive-based language extension of Fortran and C for distributed-memory parallel computers, and can be classified as a partitioned global address space (PGAS) language. One of the rema...

    Hitoshi Murai, Masahiro Nakao, Mitsuhisa Sato in XcalableMP PGAS Programming Language (2021)

  8. Chapter

    Hybrid-View Programming of Nuclear Fusion Simulation Code in XcalableMP

    XcalableMP(XMP) supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the conc...

    Keisuke Tsugane, Taisuke Boku, Hitoshi Murai in XcalableMP PGAS Programming Language (2021)

  9. No Access

    Article

    InKS: a programming model to decouple algorithm from optimization in HPC codes

    Existing programming models tend to tightly interleave algorithm and optimization in HPC simulation codes. This requires scientists to become experts in both the simulated domain and the optimization process a...

    Ksander Ejjaaouani, Olivier Aumage, Julien Bigot in The Journal of Supercomputing (2020)

  10. Chapter and Conference Paper

    MYX: Runtime Correctness Analysis for Multi-Level Parallel Programming Paradigms

    In recent years the increasing compute power is mainly provided by rapidly increasing concurrency. Therefore, the HPC community is looking for new parallel programming paradigms to make the best use of current...

    Joachim Protze, Miwako Tsuji in Software for Exascale Computing - SPPEXA 2… (2020)

  11. No Access

    Chapter

    GPU-Accelerated Language and Communication Support by FPGA

    Although the GPU is one of the most successfully used accelerating devices for HPC, there are several issues when it is used for large-scale parallel systems. To describe real applications on GPU-ready paralle...

    Taisuke Boku, Toshihiro Hanawa in Advanced Software Technologies for Post-Pe… (2019)

  12. Chapter and Conference Paper

    \(\textsc {InKS}_{\textsf {}}\) , a Programming Model to Decouple Performance from Algorithm in HPC Codes

    Existing programming models tend to tightly interleave algorithm and optimization in HPC simulation codes. This requires scientists to become experts in both the simulated domain and the optimization process a...

    Ksander Ejjaaouani, Olivier Aumage in Euro-Par 2018: Parallel Processing Worksho… (2019)

  13. No Access

    Chapter and Conference Paper

    OpenMP Extension for Explicit Task Allocation on NUMA Architecture

    Most modern HPC systems consist of a number of cores grouped into multiple NUMA nodes. The latest Intel processors have multiple NUMA nodes inside a chip. Task parallelism using OpenMP dependent tasks is a pro...

    **pil Lee, Keisuke Tsugane, Hitoshi Murai in OpenMP: Memory, Devices, and Tasks (2016)

  14. No Access

    Chapter and Conference Paper

    Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing

    Heterogeneous clusters using accelerators are widely used for high-performance computing system. In such systems, the inter-node communication among accelerators becomes bottleneck due to the data transfer bet...

    Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku in Applied Reconfigurable Computing (2015)

  15. No Access

    Chapter and Conference Paper

    Pipelined Parallelization in HPF Programs on the Earth Simulator

    There is no explicit way for parallelization of DOACROSS loops in the HPF specifications. Although recent advanced HPF compilers such as HPF/ES have been as powerful as MPI in many situations of parallel progr...

    Hitoshi Murai, Yasuo Okabe in High-Performance Computing (2008)

  16. No Access

    Chapter and Conference Paper

    How Can the Earth Simulator Impact on Human Activities

    The Earth Simulator (ES) is a vector-parallel supercomputer, consisting of 5120 vector processors. The peak performance of each vector processor is 8Gflops. Eight processors make one node with 16GB shared-memo...

    Tetsuya Sato, Hitoshi Murai in Advances in Computer Systems Architecture (2003)

  17. No Access

    Chapter and Conference Paper

    Optimization of HPF Programs with Dynamic Recompilation Technique

    Optimizing compilers perform various optimizations in order to exploit the best performance from computer systems. However, some kinds of optimizations cannot be applied if values of variables or system parame...

    Takuya Araki, Hitoshi Murai, Tsunehiko Kamachi, Yoshiki Seo in High Performance Computing (2002)