Search
Search Results
-
First Impressions of the Sapphire Rapids Processor with HBM for Scientific Workloads
The landscape of high performance computing (HPC) has witnessed exponential growth in processor diversity, architectural complexity, and performance...
-
Processor power forecasting through model sample analysis and clustering
High-accuracy processor power modeling and forecasting are critical for power management and optimization. Though there are many works about...
-
Development an efficient AXI-interconnect unit between set of customized peripheral devices and an implemented dual-core RISC-V processor
RISC-V set architecture is playing an increasingly important role in processor technology due to its open instructions which allow researchers to...
-
Adapting combined tiling to stencil optimizations on sunway processor
Stencil is one of the indispensable computation patterns in scientific applications, which is a long-standing optimization target in the field of...
-
Towards optimized tensor code generation for deep learning on sunway many-core processor
The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both...
-
Dataflow-based automatic parallelization of MATLAB/Simulink models for fitting modern multicore architectures
In many fields including aerospace, automotive, and telecommunications, MathWorks’ MATLAB/Simulink is contemporary standard for model-based design....
-
Low-power hardware-efficient memory-based DCT processor
This paper proposes a new discrete cosine transform (DCT) processor. The micro-rotation section of the architecture is based on a shared-resource...
-
RV16: An Ultra-Low-Cost Embedded RISC-V Processor Core
Embedded and Internet of Things (IoT) devices have extremely strict requirements on the area and power consumption of the processor because of the...
-
wrBench: Comparing Cache Architectures and Coherency Protocols on ARMv8 Many-Core Systems
Cache performance is a critical design constraint for modern many-core systems. Since the cache often works in a “black-box” manner, it is difficult...
-
Novel low-power pipelined DCT processor for real-time IoT applications
This research proposes a novel scalable Discrete Cosine transform (DCT) processor. It is based on a shared-resource enhanced Coordinate Rotation...
-
Antimalware applied to IoT malware detection based on softcore processor endowed with authorial sandbox
Presently, the Internet of Things (IoT) plays a crucial role in modern life, connecting hundreds of billions of devices to the internet. With the...
-
Design tactics for tailoring transformer architectures to cybersecurity challenges
In the rapidly evolving landscape of cyber threats, effective defense strategies are crucial for safeguarding sensitive information and critical...
-
A hybrid crossbar-ring on chip network topology for performance improvement of multicore architectures
Multicore architectures have achieved a popularity to deliver improved performance for different application domains. Performance of a system is...
-
SABER post-quantum key encapsulation mechanism (KEM): evaluating performance in ARM and x64 architectures
SABER is one of the four finalists in the third round of the ongoing NIST post-quantum cryptography standardization process. It is one of the three...
-
Time-sensitive autonomous architectures
Autonomous and software-defined vehicles (ASDVs) feature highly complex systems, coupling safety-critical and non-critical components such as...
-
Uncovering the performance bottleneck of modern HPC processor with static code analyzer: a case study on Kunpeng 920
The performance of high-performance computing (HPC) and other real-world applications is becoming unpredictable as the micro-architecture of the...
-
Event-Driven Architectures
In the previous chapters, you learned about serverless and distributed systems using AWS. Most of these systems had events in one form or the other.... -
Functional Verification for Agile Processor Development: A Case for Workflow Integration
Agile hardware development methodology has been widely adopted over the past decade. Despite the research progress, the industry still doubts its...
-
Optimizing GNN Inference Processing on Very Long Vector Processor
Graph Neural Network (GNN) has shown great success in graph learning. However, within the complexity of the real-world tasks and the big graph... -
Fast slope algorithm with the use of vectorization and parallelization for multicore architectures
The slope calculation algorithm is one of the most widely used geospatial algorithms employing the 3x3 moving window technique (along with...