Skip to main content

previous disabled Page of 2
and
  1. No Access

    Chapter and Conference Paper

    DeletePop: A DLT Execution Time Predictor Based on Comprehensive Modeling

    The modeling and simulation of Deep Learning Training (DLT) are challenging problems. Due to the intricate parallel patterns, existing modelings and simulations do not consider enough factors that influence t...

    Yongzhe He, Yueyuan Zhou, En Shao in Algorithms and Architectures for Parallel … (2024)

  2. No Access

    Article

    ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs

    With the development of deep learning, hardware accelerators represented by GPUs have been used to accelerate the execution of deep learning applications. A key problem in GPU cluster is how to schedule variou...

    Jie Lou, Yiming Sun, Jie Zhang, Huawei Cao in CCF Transactions on High Performance Compu… (2023)

  3. No Access

    Article

    Scalable and efficient graph traversal on high-throughput cluster

    Graph is one of the most important data structures in modern big data applications and is widely used in various fields. Among many graph algorithms, the Breadth-First Search (BFS) algorithm is a classic algor...

    Dongrui Fan, Huawei Cao, Guobo Wang, Na Nie in CCF Transactions on High Performance Compu… (2021)

  4. No Access

    Article

    Wormhole optical network: a new architecture to solve long diameter problem in exascale computer

    The exascale computer will be built in the near future thanks to rapid innovations in semiconductor logic, memory, architectures, interconnections and other essential technologies. It is difficult to design an...

    En Shao, Zhan Wang, Guojun Yuan in CCF Transactions on High Performance Compu… (2019)

  5. No Access

    Chapter and Conference Paper

    DearDRAM: Discard Weak Rows for Reducing DRAM’s Refresh Overhead

    Due to leakage current, DRAM devices need periodic refresh operations to maintain the validity of data in each DRAM cell. The shorter refresh period is, the more refresh overhead DRAM devices have to amortize....

    Xusheng Zhan, Yungang Bao, Ninghui Sun in Advanced Computer Architecture (2018)

  6. No Access

    Article

    HyperFatTree: A Large-Scale Tree-Based Network with Low-Radix Switches

    To lower the large-scale network cost and energy consumption, we proposed a hierarchical topology with low-radix switches. The hierarchical topology HyperFatTree is designed by combining Fat Tree topology and ...

    Yong Su, Zhan Wang, Zhiguo Fan, Zheng Cao in International Journal of Parallel Programm… (2017)

  7. Chapter and Conference Paper

    Regional Congestion Mitigation in Lossless Datacenter Networks

    To stop harmful congestion spreading, lossless network needs much faster congestion detection and reaction than the end-to-end approach. In this paper, we propose a switch-level regional congestion mitigation...

    **aoli Liu, Fan Yang, Yanan **, Zhan Wang, Zheng Cao in Network and Parallel Computing (2017)

  8. No Access

    Article

    Exploiting fine-grained parallelism in graph traversal algorithms via lock virtualization on multi-core architecture

    Traversal is a fundamental procedure in most parallel graph algorithms. To explore the massive fine-grained parallelism in graph traversal, the fine-grained data synchronization is critical. On commodity multi...

    Jie Yan, Guangming Tan, Ninghui Sun in The Journal of Supercomputing (2014)

  9. No Access

    Article

    Understanding parallelism in graph traversal on multi-core clusters

    There is an ever-increasing need for exploring large-scale graph data sets in computational sciences, social networks, and business analytics. However, due to irregular and memory-intensive nature, graph appli...

    Huiwei Lv, Guangming Tan, Mingyu Chen in Computer Science - Research and Development (2013)

  10. No Access

    Article

    CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications

    With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for ...

    Chunjie Luo, Jianfeng Zhan, Zhen Jia, Lei Wang, Gang Lu in Frontiers of Computer Science (2012)

  11. Chapter and Conference Paper

    CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores

    This paper addresses the workload partition strategies in the simulation of manycore architectures. The key observation behind this paper is that, compared to traditional multicores, manycores feature more non...

    Shuai Jiao, Paolo Ienne, **aochun Ye, Da Wang in Euro-Par 2012 Parallel Processing (2012)

  12. Chapter and Conference Paper

    Preliminary Investigation of Accelerating Molecular Dynamics Simulation on Godson-T Many-Core Processor

    Molecular dynamics (MD) simulation is widely used in computational science, however, its irregular memory-access pattern imposes great difficulty on performance optimization. This paper presents a joint applic...

    Liu Peng, Guangming Tan, Rajiv K. Kalia in Euro-Par 2010 Parallel Processing Workshops (2011)

  13. No Access

    Article

    Design and implementation of communication system of the Dawning 6000 supercomputer

    An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the ...

    Qiang Li, Bo Li, Zhigang Huo, Ninghui Sun in Frontiers of Computer Science in China (2010)

  14. No Access

    Article

    Dawning4000A high performance computer

    Dawning4000A is an AMD Opteron-based Linux Cluster with 11.2Tflops peak performance and 8.06Tflops Linpack performance. It was developed for the Shanghai Supercomputer Center (SSC) as one of the computing powe...

    Ninghui Sun, Dan Meng in Frontiers of Computer Science in China (2007)

  15. No Access

    Article

    Cache oblivious algorithms for nonserial polyadic programming

    The nonserial polyadic dynamic programming algorithm is one of the most fundamental algorithms for solving discrete optimization problems. Although the loops in the nonserial polyadic dynamic programming algo...

    Guangming Tan, Shengzhong Feng, Ninghui Sun in The Journal of Supercomputing (2007)

  16. Chapter and Conference Paper

    Load Balancing and Parallel Multiple Sequence Alignment with Tree Accumulation

    Multiple sequence alignment program, ClustalW, is time consuming, however, commonly used to compare the protein sequences. ClustalW includes two main time consuming parts: pairwise alignment and progressive al...

    Guangming Tan, Liu Peng, Shengzhong Feng, Ninghui Sun in Euro-Par 2006 Parallel Processing (2006)

  17. No Access

    Article

    The architecture of a specific chip for RNA secondary structure prediction

    The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two se...

    **nchun Liu, Peiheng Zhang, Ninghui Sun in Journal of Electronics (China) (2005)

  18. Chapter and Conference Paper

    Exploiting Parallelization for RNA Secondary Structure Prediction in Cluster

    RNA structure prediction remains one of the most compelling, yet elusive areas of computational biology. Many computational methods have been proposed in an attempt to predict RNA secondary structures. A popul...

    Guangming Tan, Shengzhong Feng, Ninghui Sun in Computational Science – ICCS 2005 (2005)

  19. No Access

    Chapter and Conference Paper

    Parallel Optimization Technology for Backbone Network Intrusion Detection System

    Network intrusion detection system (NIDS) is an active field of research. With the rapidly increasing network speed, the capability of the NIDS sensors limits the ability of the system. The problem is more ser...

    **aojuan Sun, **nliang Zhou, Ninghui Sun in Computational Intelligence and Security (2005)

  20. No Access

    Chapter and Conference Paper

    Design of System Area Network Interface Card Based on Intel IOP310

    A design of system area network interface card (NIC) based on the Intel IOP310 I/O processor chipset is proposed in this paper. The chipset makes it powerful for the NIC to offload the processing of communicat...

    **aojun Yang, Lili Guo, Peiheng Zhang, Ninghui Sun in Embedded Software and Systems (2005)

previous disabled Page of 2