![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Chapter and Conference Paper
DeletePop: A DLT Execution Time Predictor Based on Comprehensive Modeling
The modeling and simulation of Deep Learning Training (DLT) are challenging problems. Due to the intricate parallel patterns, existing modelings and simulations do not consider enough factors that influence t...
-
Article
ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs
With the development of deep learning, hardware accelerators represented by GPUs have been used to accelerate the execution of deep learning applications. A key problem in GPU cluster is how to schedule variou...
-
Article
Scalable and efficient graph traversal on high-throughput cluster
Graph is one of the most important data structures in modern big data applications and is widely used in various fields. Among many graph algorithms, the Breadth-First Search (BFS) algorithm is a classic algor...
-
Article
Wormhole optical network: a new architecture to solve long diameter problem in exascale computer
The exascale computer will be built in the near future thanks to rapid innovations in semiconductor logic, memory, architectures, interconnections and other essential technologies. It is difficult to design an...
-
Chapter and Conference Paper
DearDRAM: Discard Weak Rows for Reducing DRAM’s Refresh Overhead
Due to leakage current, DRAM devices need periodic refresh operations to maintain the validity of data in each DRAM cell. The shorter refresh period is, the more refresh overhead DRAM devices have to amortize....
-
Article
HyperFatTree: A Large-Scale Tree-Based Network with Low-Radix Switches
To lower the large-scale network cost and energy consumption, we proposed a hierarchical topology with low-radix switches. The hierarchical topology HyperFatTree is designed by combining Fat Tree topology and ...
-
Chapter and Conference Paper
Regional Congestion Mitigation in Lossless Datacenter Networks
To stop harmful congestion spreading, lossless network needs much faster congestion detection and reaction than the end-to-end approach. In this paper, we propose a switch-level regional congestion mitigation...
-
Article
Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture
Traversal is a fundamental procedure in most parallel graph algorithms. To explore the massive fine-grained parallelism in graph traversal, the fine-grained data synchronization is critical. On commodity multi...
-
Article
Understanding parallelism in graph traversal on multi-core clusters
There is an ever-increasing need for exploring large-scale graph data sets in computational sciences, social networks, and business analytics. However, due to irregular and memory-intensive nature, graph appli...
-
Article
CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications
With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for ...
-
Chapter and Conference Paper
CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores
This paper addresses the workload partition strategies in the simulation of manycore architectures. The key observation behind this paper is that, compared to traditional multicores, manycores feature more non...
-
Chapter and Conference Paper
Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor
Molecular dynamics (MD) simulation is widely used in computational science, however, its irregular memory-access pattern imposes great difficulty on performance optimization. This paper presents a joint applic...
-
Article
Design and implementation of communication system of the Dawning 6000 supercomputer
An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the ...
-
Article
Dawning4000A high performance computer
Dawning4000A is an AMD Opteron-based Linux Cluster with 11.2Tflops peak performance and 8.06Tflops Linpack performance. It was developed for the Shanghai Supercomputer Center (SSC) as one of the computing powe...
-
Article
Cache oblivious algorithms for nonserial polyadic programming
The nonserial polyadic dynamic programming algorithm is one of the most fundamental algorithms for solving discrete optimization problems. Although the loops in the nonserial polyadic dynamic programming algo...
-
Chapter and Conference Paper
Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation
Multiple sequence alignment program, ClustalW, is time consuming, however, commonly used to compare the protein sequences. ClustalW includes two main time consuming parts: pairwise alignment and progressive al...
-
Article
The architecture of a specific chip for RNA secondary structure prediction
The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two se...
-
Chapter and Conference Paper
Exploiting Parallelization for RNA Secondary Structure Prediction in Cluster
RNA structure prediction remains one of the most compelling, yet elusive areas of computational biology. Many computational methods have been proposed in an attempt to predict RNA secondary structures. A popul...
-
Chapter and Conference Paper
Parallel Optimization Technology for Backbone Network Intrusion Detection System
Network intrusion detection system (NIDS) is an active field of research. With the rapidly increasing network speed, the capability of the NIDS sensors limits the ability of the system. The problem is more ser...
-
Chapter and Conference Paper
Design of System Area Network Interface Card Based on Intel IOP310
A design of system area network interface card (NIC) based on the Intel IOP310 I/O processor chipset is proposed in this paper. The chipset makes it powerful for the NIC to offload the processing of communicat...