-
Chapter and Conference Paper
New Opportunities for Compilers in Computer Security
Compiler techniques have been deployed to prevent various security attacks. Examples include mitigating memory access corruption, control flow integrity checks, race detection, software diversity, etc.
-
Chapter and Conference Paper
Towards an Achievable Performance for the Loop Nests
Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the cod...
-
Chapter and Conference Paper
Using Hardware Counters to Predict Vectorization
Vectorization is the process of transforming the scalar implementation of an algorithm into vector form. This transformation aims to benefit from parallelism through the generation of microprocessor vector ins...
-
Chapter and Conference Paper
Polygonal Iteration Space Partitioning
This work presents a new set of loop transformations to expose and maximize data locality in loop-nests with non-uniform reuse patterns. The proposed set of transformations use the norms of the Polyhedral Mode...
-
Book
-
Chapter
Introduction
This introductory chapter discusses the role of instruction level parallelism (ILP) in optimizing compilers and in machine architectures that automatically reorder or parallelize programs. A brief overview of ...
-
Chapter
Percolation Scheduling
Trace scheduling suffers from a number of problems related to its focus on a single trace at a time. Percolation scheduling overcomes these problems, to the extent possible at compile time, by providing a smal...
-
Chapter
Epilogue
This book focuses on compiler-managed instruction level parallelism. While a great deal of the work on this topic has been only touched upon or mentioned only in references, the book does cover all of the majo...
-
Chapter
Scheduling Basic Blocks
A basic block in a program is a sequence of consecutive operations, such that control flow enters at the beginning and leaves at the end without internal branches. While basic block scheduling is the simplest ...
-
Chapter
Software Pipelining by Kernel Recognition
Kernel recognition techniques avoid the search for an appropriate initiation interval by dealing directly with a representation of the unrolled loop and its compaction. Intuitively, kernel recognition tries to...
-
Chapter
Overview of ILP Architectures
In this chapter we trace the history of computer architecture, focusing on the evolution of techniques for instruction-level parallelism. After briefly summarizing the early years of machine design, we focus o...
-
Chapter
Trace Scheduling
Since its introduction by Joseph A. Fisher in 1979, trace scheduling has influenced much of the work on compile-time ILP. Initially developed for use in microcode compaction, trace scheduling quickly became th...
-
Chapter
Modulo Scheduling
Loop parallelization, and particularly the parallelization of innermost loops, is the most critical aspect of any parallelizing compiler. Trace scheduling can be applied to loops, but has the disadvantage that...
-
Chapter and Conference Paper
A Compilation and Run-Time Framework for Maximizing Performance of Self-scheduling Algorithms
Ordinary programs contain many parallel loops which account for a significant portion of these programs’ completion time. The parallel executions of such loops can significantly speedup performance of modern m...
-
Chapter and Conference Paper
Just in Time Load Balancing
Leveraging Loop Level Parallelism (LLP) is one of the most attractive techniques for improving program performance on emerging multi-cores. Ordinary programs contain a large amount of parallel and DOALL loops,...
-
Chapter and Conference Paper
Optimizing Program Performance via Similarity, Using a Feature-Agnostic Approach
This work proposes a new technique for performance evaluation to predict performance of parallel programs across diverse and complex systems. In this work the term system is comprehensive of the hardware organ...
-
Chapter and Conference Paper
On the Determination of Inlining Vectors for Program Optimization
In this paper we propose a new technique and a framework to select inlining heuristic constraints - referred to as an inlining vector, for program optimization. The proposed technique uses machine learning to mod...
-
Chapter and Conference Paper
How Many Threads to Spawn during Program Multithreading?
Thread-level program parallelization is key for exploiting the hardware parallelism of the emerging multi-core systems. Several techniques have been proposed for program multithreading. However, the existing t...
-
Chapter and Conference Paper
Performance Characterization of Itanium® 2-Based Montecito Processor
This paper presents the performance characteristics of the Intel®Itanium®2-based Montecito processor and compares its performance to the previous generation Madison processor. Measurements on both are done using ...
-
Chapter and Conference Paper
Using Recursion to Boost ATLAS’s Performance
We investigate the performance benefits of a novel recursive formulation of Strassen’s algorithm over highly tuned matrix-multiply (MM) routines, such as the widely used ATLAS for high-performance systems.