Search
Search Results
-
Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism
Exploiting instruction level parallelism (ILP) is a widely used method for increasing performance of processors. While traditional very long... -
POAS: a framework for exploiting accelerator level parallelism in heterogeneous environments
In the era of heterogeneous computing, a new paradigm called accelerator level parallelism (ALP) has emerged. In ALP, accelerators are used...
-
Expressing Parallelism
Chapter 4 marks the transition from simple teaching examples toward real-world parallel code and expands upon details of the code samples we have... -
Generalizing Hierarchical Parallelism
Since the days of OpenMP 1.0 computer hardware has become more complex, typically by specializing compute units for coarse- and fine-grained... -
A neural network-based approach for the performance evaluation of branch prediction in instruction-level parallelism processors
Branch prediction is essential for improving the performance of pipeline processors. As the number of pipeline stages in modern processors increases,...
-
An efficient branch predictor for improved accuracy of instruction level parallelism
The need for modern processors is based on fast and precise branch predictors to improve the execution of instructions in the pipeline. In a parallel...
-
Evaluating RISC-V Vector Instruction Set Architecture Extension with Computer Vision Workloads
Computer vision (CV) algorithms have been extensively used for a myriad of applications nowadays. As the multimedia data are generally well-formatted...
-
Research on Instruction Pipeline Optimization Oriented to RISC-V Vector Instruction Set
Traditional general-purpose processors are scalar processors, and only one data result is obtained when an instruction is executed. But nowadays,... -
The C++ Standard Library for Parallelism and Concurrency (HPX)
We describe the C++ standard library for concurrency and parallelism (HPX). In contrast to bulk sequential programs written in the past, HPX relies... -
An Adaptive Instruction Set Encoding Automatic Generation Method for VLIW
The tight integration of hardware and software enables very long instruction word (VLIW) architectures to vastly outperform superscalar architectures... -
A Multi-level Parallel Integer/Floating-Point Arithmetic Architecture for Deep Learning Instructions
The extensive instruction-set for deep learning (DL) significantly enhances the performance of general-purpose architectures by exploiting data-level... -
Parallel fractal image compression using quadtree partition with task and dynamic parallelism
Fractal image compression is a lossy compression technique based on the iterative function system, which can be used to reduce the storage space and...
-
Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
Stencil computations within a single core or multicores of an SMP node have been over-investigated. However, the demands on HPC’s higher performance...
-
Using FPGA-based content-addressable memory for mnemonics instruction searching in assembler design
Memories play an essential role in computer systems as they store and retrieve data that may include instructions required for system operation. In...
-
High-Level Synthesis
High-level synthesis (HLS) is the process of compiling a software program into a digital circuit. This chapter provides a view into the HLS design... -
High-Level Decision Diagrams
In this chapter, we generalize the decision-based concept of logic-level SSBDDs to apply it for modelling digital systems at higher abstraction... -
Efficient High-Level Programming in Plain Java
This paper introduces the support for develo** efficient parallel programs in plain Java in the Gaspar framework. The framework supports a complete...
-
Concurrency and Parallelism
A concurrent program handles more than one task at a time. A familiar example is a web server that handles multiple client requests at the same time.... -
Rapid Prototy** of Complex Micro-architectures Through High-Level Synthesis
Register-Transfer Level (RTL) design has been a traditional approach in hardware design for several decades. However, with the growing complexity of... -
IDaTPA: importance degree based thread partitioning approach in thread level speculation
As an auto-parallelization technique with the level of thread on multi-core, Thread-Level Speculation (TLS) which is also called Speculative...