We are improving our search experience. To check which content you have full access to, or for advanced search, go back to the old search.

Search

Please fill in this field.
Filters applied:

Search Results

Showing 1-20 of 603 results
  1. Performance improvement of the triangular matrix product in commodity clusters

    There are many works devoted to improving the matrix product computation, as it is used in a wide variety of scientific applications arising from...

    Inmaculada Santamaria-Valenzuela, Rocío Carratalá-Sáez, ... Arturo Gonzalez-Escribano in The Journal of Supercomputing
    Article Open access 15 April 2024
  2. A parallel structured banded DC algorithm for symmetric eigenvalue problems

    In this paper, a novel parallel structured divide-and-conquer (DC) algorithm is proposed for symmetric banded eigenvalue problems, denoted by PBSDC,...

    Shengguo Li, **a Liao, ... **aoqiang Yue in CCF Transactions on High Performance Computing
    Article 11 August 2022
  3. Revisiting the performance optimization of QR factorization on Intel KNL and SKL multiprocessors

    This study focused on the optimization of double-precision general matrix–matrix multiplication (DGEMM) routine to improve the QR factorization...

    Muhammad Rizwan, Enoch Jung, ... Jaeyoung Choi in The Journal of Supercomputing
    Article 13 March 2024
  4. Optimizing Distributed Tensor Contractions Using Node-Aware Processor Grids

    We propose an algorithm that aims at minimizing the inter-node communication volume for distributed and memory-efficient tensor contraction schemes...
    Andreas Irmler, Raghavendra Kanakagiri, ... Andreas Grüneis in Euro-Par 2023: Parallel Processing
    Conference paper 2023
  5. High-Efficiency Specialized Support for Dense Linear Algebra Arithmetic in LuNA System

    Automatic synthesis of efficient scientific parallel programs for supercomputers is in general a complex problem of system parallel programming....
    Nikolay Belyaev, Vladislav Perepelkin in Parallel Computing Technologies
    Conference paper 2021
  6. Automatic Code Selection for the Dense Symmetric Generalized Eigenvalue Problem Using ATMathCoreLib

    Solution of the symmetric definite generalized eigenvalue problem (GEP)...
    Masato Kobayashi, Shuhei Kudo, ... Yusaku Yamamoto in Parallel Processing and Applied Mathematics
    Conference paper 2023
  7. A Task-Parallel Runtime for Heterogeneous Multi-node Vector Systems

    In recent years, high-performance computing systems are equipped with not only host processors but also accelerators, and becoming more heterogeneous...
    Kazuki Ide, Keichi Takahashi, ... Hiroyuki Takizawa in Parallel and Distributed Computing, Applications and Technologies
    Conference paper 2023
  8. Direct Solver Aiming at Elimination of Systematic Errors in 3D Stellar Positions

    The determination of three-dimensional positions and velocities of stars based on the observations collected by a space telescope suffers from the...
    Konstantin Ryabinin, Gerasimos Sarras, ... Michael Biermann in Computational Science – ICCS 2024
    Conference paper 2024
  9. COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling

    Communication-avoiding algorithms for Linear Algebra have become increasingly popular, in particular for distributed memory architectures. In...
    Marko Kabić, Simon Pintarelli, ... Joost VandeVondele in High Performance Computing
    Conference paper 2021
  10. Energy efficiency and performance analysis of a legacy atomic scale materials modeling simulator (VASP)

    This work tackles the performance and energy consumption analysis of a legacy scientific application, the VASP (Vienna Ab-initio Simulation Package),...

    Isidoro Nieves-Pírez, Alfonso Muñoz, ... Vicente Blanco in The Journal of Supercomputing
    Article Open access 16 April 2024
  11. Introduction to StarNEig—A Task-Based Library for Solving Nonsymmetric Eigenvalue Problems

    In this paper, we present the StarNEig library for solving dense nonsymmetric (generalized) eigenvalue problems. The library is built on top of the...
    Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen in Parallel Processing and Applied Mathematics
    Conference paper 2020
  12. Parallel Spectral Clustering with FEAST Library

    Spectral clustering is one of the most relevant unsupervised learning methods capable of classifying data without any a priori information. At the...
    Saad Mdaa, Anass Ouali Alami, ... Sandrine Mouysset in Advanced Research in Technologies, Information, Innovation and Sustainability
    Conference paper 2022
  13. Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors

    In high-performance computing, the general matrix-matrix multiplication (xGEMM) routine is the core of the Level 3 BLAS kernel for effective...

    Yoosang Park, Raehyun Kim, ... Jaeyoung Choi in Cluster Computing
    Article 12 April 2021
  14. A Current Task-Based Programming Paradigms Analysis

    Task-based paradigm models can be an alternative to MPI. The user defines atomic tasks with a defined input and output with the dependencies between...
    Jérôme Gurhem, Serge G. Petiton in Computational Science – ICCS 2020
    Conference paper 2020
  15. Preconditioned Jacobi SVD Algorithm Outperforms PDGESVD

    Recently, we have introduced a new preconditioner for the one-sided block-Jacobi SVD algorithm. In the serial case it outperformed the simple driver...
    Martin Bečka, Gabriel Okša in Parallel Processing and Applied Mathematics
    Conference paper 2020
  16. High performance computing for first-principles Kohn-Sham density functional theory towards exascale supercomputers

    High performance computing (HPC) plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory...

    **nming Qin, Junshi Chen, ... **long Yang in CCF Transactions on High Performance Computing
    Article 24 August 2022
  17. xMath2.0: a high-performance extended math library for SW26010-Pro many-core processor

    High performance extended math library is used by many scientific engineering and artificial intelligence applications, which usually involves many...

    Fangfang Liu, Wen**g Ma, ... Chao Yang in CCF Transactions on High Performance Computing
    Article 19 October 2022
  18. swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight

    Although matrix multiplication plays an essential role in a wide range of applications, previous works only focus on optimizing dense or sparse...

    **aoyan Liu, Yi Liu, ... Depei Qian in Frontiers of Computer Science
    Article 07 November 2022
  19. 10-Million Atoms Simulation of First-Principle Package LS3DF

    The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations. Among various...

    Yu-** Yan, Hai-Bo Li, ... Ning-Hui Sun in Journal of Computer Science and Technology
    Article 30 January 2024
  20. Efficient Matrix Computation for SGD-Based Algorithms on Apache Spark

    With the increasing of matrix size in large-scale data analysis, a series of Spark-based distributed matrix computation systems have emerged....
    Baokun Han, Zihao Chen, ... Aoying Zhou in Database Systems for Advanced Applications
    Conference paper 2022
Did you find what you were looking for? Share feedback.