Search
Search Results
-
SNN vs. CNN Implementations on FPGAs: An Empirical Evaluation
Convolutional Neural Networks (CNNs) are widely employed to solve various problems, e.g., image classification. Due to their compute- and... -
A Survey of Algorithmic and Hardware Optimization Techniques for Vision Convolutional Neural Networks on FPGAs
In today’s world, the applications of convolutional neural networks (CNN) are limitless and are employed in numerous fields. The CNNs get wider and...
-
FASS-pruner: customizing a fine-grained CNN accelerator-aware pruning framework via intra-filter splitting and inter-filter shuffling
Nowadays, with the increasing depth of CNNs, the number of computation and storage requirements with weights expands significantly, preventing their...
-
SNCL: a supernode OpenCL implementation for hybrid computing arrays
Heterogeneous computing has been develo** continuously in the field of high-performance computing because of its high performance and energy...
-
DyPipe: A Holistic Approach to Accelerating Dynamic Neural Networks with Dynamic Pipelining
Dynamic neural network (NN) techniques are increasingly important because they facilitate deep learning techniques with more complex network...
-
Evaluation of HPC Workloads Running on Open-Source RISC-V Hardware
The emerging RISC-V ecosystem has the potential to improve the speed, fidelity, and quality of hardware/software co-design R &D activities. However,... -
A Flexible Mixed-Mesh FPGA Cluster Architecture for High Speed Computing
This paper focuses on integrating multiple FPGAs for High-Performance Computing (HPC) applications with a priority on computational capability and... -
STANN – Synthesis Templates for Artificial Neural Network Inference and Training
While Deep Learning accelerators have been a research area of high interest, the focus was usually on monolithic accelerators for the inference of... -
QPU integration in OpenCL for heterogeneous programming
The integration of quantum processing units (QPUs) in a heterogeneous high-performance computing environment requires solutions that facilitate...
-
FPGA-Based Hardware/Software Codesign for Video Encoder on IoT Edge Platforms
Recently, image/video-based applications have been widely used for many domains, such as traffic, medical, or robotics. In this context, IoT-based... -
Survey on storage-accelerator data movement
The processor and the main memory in the traditional computing system cannot satisfy the requirements of the emerging large-scale applications in...
-
An Optimization Technique for PMF Estimation in Approximate Circuits
As an emerging computing technology, approximate computing enables computing systems to utilize hardware resources efficiently. Recently, approximate...
-
DOE: database offloading engine for accelerating SQL processing
The CPU-Accelerator heterogeneous systems have demonstrated performance and efficiency benefits on DBMSs. However, the CPU-Cache-DRAM architecture...
-
Compiler-Assisted Operator Template Library for DNN Accelerators
Despite many dedicated accelerators are gaining popularity for their performance and energy efficiency in the deep neural network (DNN) domain,... -
HFPQ: deep neural network compression by hardware-friendly pruning-quantization
This paper presents a hardware-friendly compression method for deep neural networks. This method effectively combines layered channel pruning with...
-
SWG: an architecture for sparse weight gradient computation
On-device training for deep neural networks (DNN) has become a trend due to various user preferences and scenarios. The DNN training process consists...
-
In-Depth Analysis of OLAP Query Performance on Heterogeneous Hardware
Classical database systems are now facing the challenge of processing high-volume data feeds at unprecedented rates as efficiently as possible while...
-
Accelerating OCaml Programs on FPGA
This paper aims to exploit the massive parallelism of Field-Programmable Gate Arrays (FPGAs) by programming them in OCaml, a multiparadigm and...
-
Hetero-Vis: A Framework for Latency Optimized Heterogeneous Deployment of Convolutional Neural Networks
Convolutional Neural Network (CNN) models often comprise multiple layers varying in compute requirements. For deployment, a number of hardware... -
Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments
Contemporary HPC hardware typically provides several levels of parallelism, e.g. multiple nodes, each having multiple cores (possibly with...