Search Page | SpringerLink

oclCUB: an OpenCL parallel computing library for deep learning operators

Deep learning (DL) mainly uses various parallel computing libraries to optimize the speed of model training. The underlying computations of the DL...

Changqing Shi, Yufei Sun, ... Yuzhi Zhang in CCF Transactions on High Performance Computing

Article 16 February 2024

Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems

This article describes the implementation of a dynamic scheduler for loading distribution between a hardware accelerator RTL and a CPU software task....

Cristian Andy Tănase in The Journal of Supercomputing

Article 13 March 2020

Introduction to the special issue on self‑managing and hardware‑optimized database systems 2022

Constantinos Costa, Ilia Petrov in Distributed and Parallel Databases

Article 03 June 2023

Photonic Computing and Communication for Neural Network Accelerators

Conventional electronic Artificial Neural Networks (ANNs) accelerators focus on architecture design and numerical computation optimization to improve...

Chengpeng **a, Yawen Chen, ... Jigang Wu in Parallel and Distributed Computing, Applications and Technologies

Conference paper 2022

PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators

Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the...

Qinghao Hu, Gang Li, ... Jian Cheng in Computer Vision – ECCV 2022

Conference paper 2022

TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators

Training deep neural networks (DNNs) with half-precision floating-point formats is widely supported on recent hardware and frameworks. However,...

Zhen **e, Siddhisanket Raskar, ... Venkatram Vishwanath in Euro-Par 2023: Parallel Processing

Conference paper 2023

A Custom Hardware Architecture for the Link Assessment Problem

Heterogeneous accelerator enhanced computing architectures are a common solution in embedded computing, mainly due to the constraints in energy and...

André Chinazzo, Christian De Schryver, ... Norbert Wehn in Algorithms for Big Data

Chapter Open access 2022

A Hardware Co-design Workflow for Scientific Instruments at the Edge

As spatial and temporal resolutions of scientific instruments improve, the explosion in the volume of data produced is becoming a key challenge. It...

Kazutomo Yoshii, Rajesh Sankaran, ... Antonino Miceli in Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation

Conference paper 2022

EFFLUX-F2: A High Performance Hardware Security Evaluation Board

Side-channel analysis has become a cornerstone of modern hardware security evaluation for cryptographic accelerators. Recently, these techniques are...

Arpan Jati, Naina Gupta, ... Somitra Kumar Sanadhya in Constructive Side-Channel Analysis and Secure Design

Conference paper 2024

Evolutionary-Based Co-optimization of DNN and Hardware Configurations on Edge GPU

The ever-increasing complexity of both Deep Neural Networks (DNN) and hardware accelerators has made the co-optimization of these domains extremely...

Halima Bouzidi, Hamza Ouarnoughi, ... Smail Niar in Optimization and Learning

Conference paper 2022

Hardware Implementation for Spiking Neural Networks on Edge Devices

In the modern data-intensive Internet of Things (IoT) applications, intelligence is enabled by collecting data at the edge devices for processing....

Thao N. N. Nguyen, Bharadwaj Veeravalli, Xuanyao Fong in Predictive Analytics in Cloud, Fog, and Edge Computing

Chapter 2023

Deep Quantization of Graph Neural Networks with Run-Time Hardware-Aware Training

In this paper, we investigate the benefits of hardware-aware quantization in the gFADES hardware accelerator targeting Graph Convolutional Networks...

Olle Hansson, Mahdieh Grailoo, ... Jose Nunez-Yanez in Applied Reconfigurable Computing. Architectures, Tools, and Applications

Conference paper 2024

Deep learning accelerators: a case study with MAESTRO

In recent years, deep learning has become one of the most important topics in computer sciences. Deep learning is a growing trend in the edge of...

Hamidreza Bolhasani, Somayyeh Jafarali Jassbi in Journal of Big Data

Article Open access 12 November 2020

FPGA-based accelerator for object detection: a comprehensive survey

Object detection is one of the most challenging tasks in computer vision. With the advances in semiconductor devices and chip technology, hardware...

Kai Zeng, Qian Ma, ... Chenggang Yan in The Journal of Supercomputing

Article 29 March 2022

Conclusions

This book aimed to introduce to the reader how heterogeneous hardware acceleration is changing the programming landscape, while posing a number of...

Juan Fumero, Athanasios Stratikopoulos, Christos Kotselidis in Programming Heterogeneous Hardware via Managed Runtime Systems

Chapter 2024

Implementing LU and Cholesky factorizations on artificial intelligence accelerators

LU and Cholesky factorizations for dense matrices are one of the most fundamental building blocks in a number of numerical applications. Because of...

Yuechen Lu, Yuchen Luo, ... Weifeng Liu in CCF Transactions on High Performance Computing

Article 24 August 2021

Hybrid SORN Hardware Accelerator for Support Vector Machines

This paper presents a new approach for support vector filtering to accelerate the training process of support vector machines (SVMs). It is based on...

Nils Hülsmeier, Moritz Bärthel, ... Steffen Paul in Next Generation Arithmetic

Conference paper 2023

Compute Accelerators and Other GPUs

As GPUs got more powerful and took on more tasks, they transcended the roles they were originally designed to do and were sometimes called...

Jon Peddie in The History of the GPU - New Developments

Chapter 2022

Hardware and Software Co-optimization of Convolutional and Self-attention Combined Model Based on FPGA

Since Transformer was proposed, the self-attention mechanism has been widely used. Some studies have tried to apply the self-attention mechanism to...

Wei Hu, Heyuan Li, ... Zhiyv Zhong in Web and Big Data

Conference paper 2024

Fast Offloading of Accelerator Task over Network with Hardware Assistance

Today, applications such as image recognition in vehicles and drones requiring high computational performance and low latency offload some tasks to...

Shogo Saito, Kei Fujimoto, ... Akinori Shiraga in Edge Computing – EDGE 2022

Conference paper 2022

Search

Filters

Search Results

Search

Navigation