We are improving our search experience. To check which content you have full access to, or for advanced search, go back to the old search.

Search

Please fill in this field.
Filters applied:

Search Results

Showing 81-100 of 4,997 results
  1. oclCUB: an OpenCL parallel computing library for deep learning operators

    Deep learning (DL) mainly uses various parallel computing libraries to optimize the speed of model training. The underlying computations of the DL...

    Changqing Shi, Yufei Sun, ... Yuzhi Zhang in CCF Transactions on High Performance Computing
    Article 16 February 2024
  2. Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems

    This article describes the implementation of a dynamic scheduler for loading distribution between a hardware accelerator RTL and a CPU software task....

    Cristian Andy Tănase in The Journal of Supercomputing
    Article 13 March 2020
  3. Photonic Computing and Communication for Neural Network Accelerators

    Conventional electronic Artificial Neural Networks (ANNs) accelerators focus on architecture design and numerical computation optimization to improve...
    Chengpeng **a, Yawen Chen, ... Jigang Wu in Parallel and Distributed Computing, Applications and Technologies
    Conference paper 2022
  4. PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators

    Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the...
    Qinghao Hu, Gang Li, ... Jian Cheng in Computer Vision – ECCV 2022
    Conference paper 2022
  5. TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators

    Training deep neural networks (DNNs) with half-precision floating-point formats is widely supported on recent hardware and frameworks. However,...
    Zhen **e, Siddhisanket Raskar, ... Venkatram Vishwanath in Euro-Par 2023: Parallel Processing
    Conference paper 2023
  6. A Custom Hardware Architecture for the Link Assessment Problem

    Heterogeneous accelerator enhanced computing architectures are a common solution in embedded computing, mainly due to the constraints in energy and...
    André Chinazzo, Christian De Schryver, ... Norbert Wehn in Algorithms for Big Data
    Chapter Open access 2022
  7. A Hardware Co-design Workflow for Scientific Instruments at the Edge

    As spatial and temporal resolutions of scientific instruments improve, the explosion in the volume of data produced is becoming a key challenge. It...
    Conference paper 2022
  8. EFFLUX-F2: A High Performance Hardware Security Evaluation Board

    Side-channel analysis has become a cornerstone of modern hardware security evaluation for cryptographic accelerators. Recently, these techniques are...
    Arpan Jati, Naina Gupta, ... Somitra Kumar Sanadhya in Constructive Side-Channel Analysis and Secure Design
    Conference paper 2024
  9. Evolutionary-Based Co-optimization of DNN and Hardware Configurations on Edge GPU

    The ever-increasing complexity of both Deep Neural Networks (DNN) and hardware accelerators has made the co-optimization of these domains extremely...
    Halima Bouzidi, Hamza Ouarnoughi, ... Smail Niar in Optimization and Learning
    Conference paper 2022
  10. Hardware Implementation for Spiking Neural Networks on Edge Devices

    In the modern data-intensive Internet of Things (IoT) applications, intelligence is enabled by collecting data at the edge devices for processing....
    Thao N. N. Nguyen, Bharadwaj Veeravalli, Xuanyao Fong in Predictive Analytics in Cloud, Fog, and Edge Computing
    Chapter 2023
  11. Deep Quantization of Graph Neural Networks with Run-Time Hardware-Aware Training

    In this paper, we investigate the benefits of hardware-aware quantization in the gFADES hardware accelerator targeting Graph Convolutional Networks...
    Olle Hansson, Mahdieh Grailoo, ... Jose Nunez-Yanez in Applied Reconfigurable Computing. Architectures, Tools, and Applications
    Conference paper 2024
  12. Deep learning accelerators: a case study with MAESTRO

    In recent years, deep learning has become one of the most important topics in computer sciences. Deep learning is a growing trend in the edge of...

    Hamidreza Bolhasani, Somayyeh Jafarali Jassbi in Journal of Big Data
    Article Open access 12 November 2020
  13. FPGA-based accelerator for object detection: a comprehensive survey

    Object detection is one of the most challenging tasks in computer vision. With the advances in semiconductor devices and chip technology, hardware...

    Kai Zeng, Qian Ma, ... Chenggang Yan in The Journal of Supercomputing
    Article 29 March 2022
  14. Conclusions

    This book aimed to introduce to the reader how heterogeneous hardware acceleration is changing the programming landscape, while posing a number of...
    Juan Fumero, Athanasios Stratikopoulos, Christos Kotselidis in Programming Heterogeneous Hardware via Managed Runtime Systems
    Chapter 2024
  15. Implementing LU and Cholesky factorizations on artificial intelligence accelerators

    LU and Cholesky factorizations for dense matrices are one of the most fundamental building blocks in a number of numerical applications. Because of...

    Yuechen Lu, Yuchen Luo, ... Weifeng Liu in CCF Transactions on High Performance Computing
    Article 24 August 2021
  16. Hybrid SORN Hardware Accelerator for Support Vector Machines

    This paper presents a new approach for support vector filtering to accelerate the training process of support vector machines (SVMs). It is based on...
    Nils Hülsmeier, Moritz Bärthel, ... Steffen Paul in Next Generation Arithmetic
    Conference paper 2023
  17. Compute Accelerators and Other GPUs

    As GPUs got more powerful and took on more tasks, they transcended the roles they were originally designed to do and were sometimes called...
    Chapter 2022
  18. Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

    Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to...
    Wei Hu, Heyuan Li, ... Zhiyv Zhong in Web and Big Data
    Conference paper 2024
  19. Fast Offloading of Accelerator Task over Network with Hardware Assistance

    Today, applications such as image recognition in vehicles and drones requiring high computational performance and low latency offload some tasks to...
    Shogo Saito, Kei Fujimoto, ... Akinori Shiraga in Edge Computing – EDGE 2022
    Conference paper 2022
Did you find what you were looking for? Share feedback.