Search Results - Springer

Sort By Newest First Oldest First

Chapter and Conference Paper

DeletePop: A DLT Execution Time Predictor Based on Comprehensive Modeling

The modeling and simulation of Deep Learning Training (DLT) are challenging problems. Due to the intricate parallel patterns, existing modelings and simulations do not consider enough factors that influence t...

Yongzhe He, Yueyuan Zhou, En Shao… in Algorithms and Architectures for Parallel … (2024)
Article

ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs

With the development of deep learning, hardware accelerators represented by GPUs have been used to accelerate the execution of deep learning applications. A key problem in GPU cluster is how to schedule variou...

Jie Lou, Yiming Sun, Jie Zhang, Huawei Cao… in CCF Transactions on High Performance Compu… (2023)
Article

Scalable and efficient graph traversal on high-throughput cluster

Graph is one of the most important data structures in modern big data applications and is widely used in various fields. Among many graph algorithms, the Breadth-First Search (BFS) algorithm is a classic algor...

Dongrui Fan, Huawei Cao, Guobo Wang, Na Nie… in CCF Transactions on High Performance Compu… (2021)
Article

Wormhole optical network: a new architecture to solve long diameter problem in exascale computer

The exascale computer will be built in the near future thanks to rapid innovations in semiconductor logic, memory, architectures, interconnections and other essential technologies. It is difficult to design an...

En Shao, Zhan Wang, Guojun Yuan… in CCF Transactions on High Performance Compu… (2019)
Chapter and Conference Paper

DearDRAM: Discard Weak Rows for Reducing DRAM’s Refresh Overhead

Due to leakage current, DRAM devices need periodic refresh operations to maintain the validity of data in each DRAM cell. The shorter refresh period is, the more refresh overhead DRAM devices have to amortize....

Xusheng Zhan, Yungang Bao, Ninghui Sun in Advanced Computer Architecture (2018)
Article

HyperFatTree: A Large-Scale Tree-Based Network with Low-Radix Switches

To lower the large-scale network cost and energy consumption, we proposed a hierarchical topology with low-radix switches. The hierarchical topology HyperFatTree is designed by combining Fat Tree topology and ...

Yong Su, Zhan Wang, Zhiguo Fan, Zheng Cao… in International Journal of Parallel Programm… (2017)
Chapter and Conference Paper

Regional Congestion Mitigation in Lossless Datacenter Networks

To stop harmful congestion spreading, lossless network needs much faster congestion detection and reaction than the end-to-end approach. In this paper, we propose a switch-level regional congestion mitigation...

**aoli Liu, Fan Yang, Yanan **, Zhan Wang, Zheng Cao… in Network and Parallel Computing (2017)

Download PDF (1087 KB) View Chapter
Article

Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture

Traversal is a fundamental procedure in most parallel graph algorithms. To explore the massive fine-grained parallelism in graph traversal, the fine-grained data synchronization is critical. On commodity multi...

Jie Yan, Guangming Tan, Ninghui Sun in The Journal of Supercomputing (2014)
Article

Understanding parallelism in graph traversal on multi-core clusters

There is an ever-increasing need for exploring large-scale graph data sets in computational sciences, social networks, and business analytics. However, due to irregular and memory-intensive nature, graph appli...

Huiwei Lv, Guangming Tan, Mingyu Chen… in Computer Science - Research and Development (2013)
Article

CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications

With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for ...

Chunjie Luo, Jianfeng Zhan, Zhen Jia, Lei Wang, Gang Lu… in Frontiers of Computer Science (2012)
Chapter and Conference Paper

CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores

This paper addresses the workload partition strategies in the simulation of manycore architectures. The key observation behind this paper is that, compared to traditional multicores, manycores feature more non...

Shuai Jiao, Paolo Ienne, **aochun Ye, Da Wang… in Euro-Par 2012 Parallel Processing (2012)

Download PDF (1833 KB)
Chapter and Conference Paper

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor

Molecular dynamics (MD) simulation is widely used in computational science, however, its irregular memory-access pattern imposes great difficulty on performance optimization. This paper presents a joint applic...

Liu Peng, Guangming Tan, Rajiv K. Kalia… in Euro-Par 2010 Parallel Processing Workshops (2011)

Download PDF (383 KB)
Article

Design and implementation of communication system of the Dawning 6000 supercomputer

An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the ...

Qiang Li, Bo Li, Zhigang Huo, Ninghui Sun in Frontiers of Computer Science in China (2010)
Article

Dawning4000A high performance computer

Dawning4000A is an AMD Opteron-based Linux Cluster with 11.2Tflops peak performance and 8.06Tflops Linpack performance. It was developed for the Shanghai Supercomputer Center (SSC) as one of the computing powe...

Ninghui Sun, Dan Meng in Frontiers of Computer Science in China (2007)
Article

Cache oblivious algorithms for nonserial polyadic programming

The nonserial polyadic dynamic programming algorithm is one of the most fundamental algorithms for solving discrete optimization problems. Although the loops in the nonserial polyadic dynamic programming algo...

Guangming Tan, Shengzhong Feng, Ninghui Sun in The Journal of Supercomputing (2007)
Chapter and Conference Paper

Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation

Multiple sequence alignment program, ClustalW, is time consuming, however, commonly used to compare the protein sequences. ClustalW includes two main time consuming parts: pairwise alignment and progressive al...

Guangming Tan, Liu Peng, Shengzhong Feng, Ninghui Sun in Euro-Par 2006 Parallel Processing (2006)

Download PDF (516 KB)
Article

The architecture of a specific chip for RNA secondary structure prediction

The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two se...

**nchun Liu, Peiheng Zhang, Ninghui Sun in Journal of Electronics (China) (2005)
Chapter and Conference Paper

Exploiting Parallelization for RNA Secondary Structure Prediction in Cluster

RNA structure prediction remains one of the most compelling, yet elusive areas of computational biology. Many computational methods have been proposed in an attempt to predict RNA secondary structures. A popul...

Guangming Tan, Shengzhong Feng, Ninghui Sun in Computational Science – ICCS 2005 (2005)

Download PDF (196 KB)
Chapter and Conference Paper

Parallel Optimization Technology for Backbone Network Intrusion Detection System

Network intrusion detection system (NIDS) is an active field of research. With the rapidly increasing network speed, the capability of the NIDS sensors limits the ability of the system. The problem is more ser...

**aojuan Sun, **nliang Zhou, Ninghui Sun… in Computational Intelligence and Security (2005)
Chapter and Conference Paper

Design of System Area Network Interface Card Based on Intel IOP310

A design of system area network interface card (NIC) based on the Intel IOP310 I/O processor chipset is proposed in this paper. The chipset makes it powerful for the NIC to offload the processing of communicat...

**aojun Yang, Lili Guo, Peiheng Zhang, Ninghui Sun in Embedded Software and Systems (2005)

23 Result(s)

DeletePop: A DLT Execution Time Predictor Based on Comprehensive Modeling

ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs

Scalable and efficient graph traversal on high-throughput cluster

Wormhole optical network: a new architecture to solve long diameter problem in exascale computer

DearDRAM: Discard Weak Rows for Reducing DRAM’s Refresh Overhead

HyperFatTree: A Large-Scale Tree-Based Network with Low-Radix Switches

Regional Congestion Mitigation in Lossless Datacenter Networks

Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture

Understanding parallelism in graph traversal on multi-core clusters

CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications

CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores

Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor

Design and implementation of communication system of the Dawning 6000 supercomputer

Dawning4000A high performance computer

Cache oblivious algorithms for nonserial polyadic programming

Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation

The architecture of a specific chip for RNA secondary structure prediction

Exploiting Parallelization for RNA Secondary Structure Prediction in Cluster

Parallel Optimization Technology for Backbone Network Intrusion Detection System

Design of System Area Network Interface Card Based on Intel IOP310

Our Content

Other Sites

Help & Contacts