![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Chapter and Conference Paper
Optimization of HPF Programs with Dynamic Recompilation Technique
Optimizing compilers perform various optimizations in order to exploit the best performance from computer systems. However, some kinds of optimizations cannot be applied if values of variables or system parame...
-
Chapter and Conference Paper
How Can the Earth Simulator Impact on Human Activities
The Earth Simulator (ES) is a vector-parallel supercomputer, consisting of 5120 vector processors. The peak performance of each vector processor is 8Gflops. Eight processors make one node with 16GB shared-memo...
-
Chapter and Conference Paper
Pipelined Parallelization in HPF Programs on the Earth Simulator
There is no explicit way for parallelization of DOACROSS loops in the HPF specifications. Although recent advanced HPF compilers such as HPF/ES have been as powerful as MPI in many situations of parallel progr...
-
Chapter and Conference Paper
Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing
Heterogeneous clusters using accelerators are widely used for high-performance computing system. In such systems, the inter-node communication among accelerators becomes bottleneck due to the data transfer bet...
-
Chapter and Conference Paper
OpenMP Extension for Explicit Task Allocation on NUMA Architecture
Most modern HPC systems consist of a number of cores grouped into multiple NUMA nodes. The latest Intel processors have multiple NUMA nodes inside a chip. Task parallelism using OpenMP dependent tasks is a pro...
-
Chapter
GPU-Accelerated Language and Communication Support by FPGA
Although the GPU is one of the most successfully used accelerating devices for HPC, there are several issues when it is used for large-scale parallel systems. To describe real applications on GPU-ready paralle...
-
Chapter and Conference Paper
\(\textsc {InKS}_{\textsf {}}\) , a Programming Model to Decouple Performance from Algorithm in HPC Codes
Existing programming models tend to tightly interleave algorithm and optimization in HPC simulation codes. This requires scientists to become experts in both the simulated domain and the optimization process a...
-
Chapter and Conference Paper
MYX: Runtime Correctness Analysis for Multi-Level Parallel Programming Paradigms
In recent years the increasing compute power is mainly provided by rapidly increasing concurrency. Therefore, the HPC community is looking for new parallel programming paradigms to make the best use of current...
-
Article
InKS: a programming model to decouple algorithm from optimization in HPC codes
Existing programming models tend to tightly interleave algorithm and optimization in HPC simulation codes. This requires scientists to become experts in both the simulated domain and the optimization process a...
-
Chapter
Multi-SPMD Programming Model with YML and XcalableMP
This chapter describes a multi-SPMD (mSPMD) programming model and a set of software and libraries to support the mSPMD programming model. The mSPMD programming model has been proposed to realize scalable appli...
-
Chapter
XcalableMP 2.0 and Future Directions
This chapter presents the XcalableMP on the Fugaku supercomputer, the Japanese flagship supercomputer developed by FLAGSHIP2020 project in RIKEN R-CCS. The porting and the performance evaluation were done as a...
-
Chapter
Implementation and Performance Evaluation of Omni Compiler
This chapter describes the implementation and performance evaluation of Omni compiler, which is a reference implementation of the compiler for XcalableMP. For performance evaluation, this chapter also presents...
-
Chapter
XcalableACC: An Integration of XcalableMP and OpenACC
XcalableACC (XACC) is an extension of XcalableMP for accelerated clusters. It is defined as a diagonal integration of XcalableMP and OpenACC, which is another directive-based language designed to program heter...
-
Chapter
XcalableMP Programming Model and Language
XcalableMP (XMP) is a directive-based language extension of Fortran and C for distributed-memory parallel computers, and can be classified as a partitioned global address space (PGAS) language. One of the rema...
-
Chapter
Hybrid-View Programming of Nuclear Fusion Simulation Code in XcalableMP
XcalableMP(XMP) supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the conc...
-
Chapter and Conference Paper
OpenACC Unified Programming Environment for Multi-hybrid Acceleration with GPU and FPGA
Accelerated computing in HPC such as with GPU, plays a central role in HPC nowadays. However, in some complicated applications with partially different performance behavior is hard to solve with a single type ...
-
Article
Open AccessDesign and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication
The increasing trend of manycore processors makes multithreaded communication more important to avoid costly global synchronization among cores. One of the representative approaches that require multithreaded ...