Search Page | SpringerLink

Performance improvement of the triangular matrix product in commodity clusters

There are many works devoted to improving the matrix product computation, as it is used in a wide variety of scientific applications arising from...

Inmaculada Santamaria-Valenzuela, Rocío Carratalá-Sáez, ... Arturo Gonzalez-Escribano in The Journal of Supercomputing

Article Open access 15 April 2024

A parallel structured banded DC algorithm for symmetric eigenvalue problems

In this paper, a novel parallel structured divide-and-conquer (DC) algorithm is proposed for symmetric banded eigenvalue problems, denoted by PBSDC,...

Shengguo Li, **a Liao, ... **aoqiang Yue in CCF Transactions on High Performance Computing

Article 11 August 2022

Revisiting the performance optimization of QR factorization on Intel KNL and SKL multiprocessors

This study focused on the optimization of double-precision general matrix–matrix multiplication (DGEMM) routine to improve the QR factorization...

Muhammad Rizwan, Enoch Jung, ... Jaeyoung Choi in The Journal of Supercomputing

Article 13 March 2024

Optimizing Distributed Tensor Contractions Using Node-Aware Processor Grids

We propose an algorithm that aims at minimizing the inter-node communication volume for distributed and memory-efficient tensor contraction schemes...

Andreas Irmler, Raghavendra Kanakagiri, ... Andreas Grüneis in Euro-Par 2023: Parallel Processing

Conference paper 2023

High-Efficiency Specialized Support for Dense Linear Algebra Arithmetic in LuNA System

Automatic synthesis of efficient scientific parallel programs for supercomputers is in general a complex problem of system parallel programming....

Nikolay Belyaev, Vladislav Perepelkin in Parallel Computing Technologies

Conference paper 2021

Automatic Code Selection for the Dense Symmetric Generalized Eigenvalue Problem Using ATMathCoreLib

Solution of the symmetric definite generalized eigenvalue problem (GEP)...

Masato Kobayashi, Shuhei Kudo, ... Yusaku Yamamoto in Parallel Processing and Applied Mathematics

Conference paper 2023

A Task-Parallel Runtime for Heterogeneous Multi-node Vector Systems

In recent years, high-performance computing systems are equipped with not only host processors but also accelerators, and becoming more heterogeneous...

Kazuki Ide, Keichi Takahashi, ... Hiroyuki Takizawa in Parallel and Distributed Computing, Applications and Technologies

Conference paper 2023

Direct Solver Aiming at Elimination of Systematic Errors in 3D Stellar Positions

The determination of three-dimensional positions and velocities of stars based on the observations collected by a space telescope suffers from the...

Konstantin Ryabinin, Gerasimos Sarras, ... Michael Biermann in Computational Science – ICCS 2024

Conference paper 2024

COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling

Communication-avoiding algorithms for Linear Algebra have become increasingly popular, in particular for distributed memory architectures. In...

Marko Kabić, Simon Pintarelli, ... Joost VandeVondele in High Performance Computing

Conference paper 2021

Energy efficiency and performance analysis of a legacy atomic scale materials modeling simulator (VASP)

This work tackles the performance and energy consumption analysis of a legacy scientific application, the VASP (Vienna Ab-initio Simulation Package),...

Isidoro Nieves-Pírez, Alfonso Muñoz, ... Vicente Blanco in The Journal of Supercomputing

Article Open access 16 April 2024

Introduction to StarNEig—A Task-Based Library for Solving Nonsymmetric Eigenvalue Problems

In this paper, we present the StarNEig library for solving dense nonsymmetric (generalized) eigenvalue problems. The library is built on top of the...

Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen in Parallel Processing and Applied Mathematics

Conference paper 2020

Parallel Spectral Clustering with FEAST Library

Spectral clustering is one of the most relevant unsupervised learning methods capable of classifying data without any a priori information. At the...

Saad Mdaa, Anass Ouali Alami, ... Sandrine Mouysset in Advanced Research in Technologies, Information, Innovation and Sustainability

Conference paper 2022

Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors

In high-performance computing, the general matrix-matrix multiplication (xGEMM) routine is the core of the Level 3 BLAS kernel for effective...

Yoosang Park, Raehyun Kim, ... Jaeyoung Choi in Cluster Computing

Article 12 April 2021

A Current Task-Based Programming Paradigms Analysis

Task-based paradigm models can be an alternative to MPI. The user defines atomic tasks with a defined input and output with the dependencies between...

Jérôme Gurhem, Serge G. Petiton in Computational Science – ICCS 2020

Conference paper 2020

Preconditioned Jacobi SVD Algorithm Outperforms PDGESVD

Recently, we have introduced a new preconditioner for the one-sided block-Jacobi SVD algorithm. In the serial case it outperformed the simple driver...

Martin Bečka, Gabriel Okša in Parallel Processing and Applied Mathematics

Conference paper 2020

High performance computing for first-principles Kohn-Sham density functional theory towards exascale supercomputers

High performance computing (HPC) plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory...

**nming Qin, Junshi Chen, ... **long Yang in CCF Transactions on High Performance Computing

Article 24 August 2022

xMath2.0: a high-performance extended math library for SW26010-Pro many-core processor

High performance extended math library is used by many scientific engineering and artificial intelligence applications, which usually involves many...

Fangfang Liu, Wen**g Ma, ... Chao Yang in CCF Transactions on High Performance Computing

Article 19 October 2022

swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight

Although matrix multiplication plays an essential role in a wide range of applications, previous works only focus on optimizing dense or sparse...

**aoyan Liu, Yi Liu, ... Depei Qian in Frontiers of Computer Science

Article 07 November 2022

10-Million Atoms Simulation of First-Principle Package LS3DF

The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations. Among various...

Yu-** Yan, Hai-Bo Li, ... Ning-Hui Sun in Journal of Computer Science and Technology

Article 30 January 2024

Efficient Matrix Computation for SGD-Based Algorithms on Apache Spark

With the increasing of matrix size in large-scale data analysis, a series of Spark-based distributed matrix computation systems have emerged....

Baokun Han, Zihao Chen, ... Aoying Zhou in Database Systems for Advanced Applications

Conference paper 2022

Search

Filters

Search Results

Search

Navigation