Search
Search Results
-
oclCUB: an OpenCL parallel computing library for deep learning operators
Deep learning (DL) mainly uses various parallel computing libraries to optimize the speed of model training. The underlying computations of the DL...
-
Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems
This article describes the implementation of a dynamic scheduler for loading distribution between a hardware accelerator RTL and a CPU software task....
-
Photonic Computing and Communication for Neural Network Accelerators
Conventional electronic Artificial Neural Networks (ANNs) accelerators focus on architecture design and numerical computation optimization to improve... -
PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators
Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the... -
TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators
Training deep neural networks (DNNs) with half-precision floating-point formats is widely supported on recent hardware and frameworks. However,... -
A Custom Hardware Architecture for the Link Assessment Problem
Heterogeneous accelerator enhanced computing architectures are a common solution in embedded computing, mainly due to the constraints in energy and... -
A Hardware Co-design Workflow for Scientific Instruments at the Edge
As spatial and temporal resolutions of scientific instruments improve, the explosion in the volume of data produced is becoming a key challenge. It... -
EFFLUX-F2: A High Performance Hardware Security Evaluation Board
Side-channel analysis has become a cornerstone of modern hardware security evaluation for cryptographic accelerators. Recently, these techniques are... -
Evolutionary-Based Co-optimization of DNN and Hardware Configurations on Edge GPU
The ever-increasing complexity of both Deep Neural Networks (DNN) and hardware accelerators has made the co-optimization of these domains extremely... -
Hardware Implementation for Spiking Neural Networks on Edge Devices
In the modern data-intensive Internet of Things (IoT) applications, intelligence is enabled by collecting data at the edge devices for processing.... -
Deep Quantization of Graph Neural Networks with Run-Time Hardware-Aware Training
In this paper, we investigate the benefits of hardware-aware quantization in the gFADES hardware accelerator targeting Graph Convolutional Networks... -
Deep learning accelerators: a case study with MAESTRO
In recent years, deep learning has become one of the most important topics in computer sciences. Deep learning is a growing trend in the edge of...
-
FPGA-based accelerator for object detection: a comprehensive survey
Object detection is one of the most challenging tasks in computer vision. With the advances in semiconductor devices and chip technology, hardware...
-
Conclusions
This book aimed to introduce to the reader how heterogeneous hardware acceleration is changing the programming landscape, while posing a number of... -
Implementing LU and Cholesky factorizations on artificial intelligence accelerators
LU and Cholesky factorizations for dense matrices are one of the most fundamental building blocks in a number of numerical applications. Because of...
-
Hybrid SORN Hardware Accelerator for Support Vector Machines
This paper presents a new approach for support vector filtering to accelerate the training process of support vector machines (SVMs). It is based on... -
Compute Accelerators and Other GPUs
As GPUs got more powerful and took on more tasks, they transcended the roles they were originally designed to do and were sometimes called... -
Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA
Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to... -
Fast Offloading of Accelerator Task over Network with Hardware Assistance
Today, applications such as image recognition in vehicles and drones requiring high computational performance and low latency offload some tasks to...