-
Chapter and Conference Paper
SWIRL ++ : Evaluating Performance Models to Guide Code Transformation in Convolutional Neural Networks
Convolutional Neural Networks (CNNs) are ubiquitous in applications ranging from self-driving cars to various branches of health care. CPUs with large core counts and wide SIMD support are used in HPC clusters...
-
Chapter and Conference Paper
Mozart : Efficient Composition of Library Functions for Heterogeneous Execution
Current processor trend is to couple a commodity processor with a GPU, a co-processor, or an accelerator. To unleash the full computational power of such heterogeneous systems is a daunting task: programmers o...
-
Article
Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization
Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potent...
-
Chapter and Conference Paper
Using Dynamic Compilation to Achieve Ninja Performance for CNN Training on Many-Core Processors
Convolutional Neural Networks (CNNs) represent a class of Deep Neural Networks that is growing in importance due to their state-of-the-art performance in pattern recognition tasks in various domains, including...
-
Chapter and Conference Paper
Compiler-Driven Data Layout Transformation for Heterogeneous Platforms
Modern heterogeneous systems comprise of CPU cores, GPU cores, and in some cases, accelerator cores. Each of these computational cores have very different memory hierarchies, making it challenging to efficient...
-
Chapter and Conference Paper
Inter-iteration Scalar Replacement Using Array SSA Form
In this paper, we introduce novel simple and efficient analysis algorithms for scalar replacement and dead store elimination that are built on Array SSA form, a uniform representation for capturing control and...
-
Chapter and Conference Paper
Static Detection of Place Locality and Elimination of Runtime Checks
Harnessing parallelism particularly for high performance computing is a demanding topic of research. Limitations and complexities of automatic parallelization have led to programming language notations wherein...
-
Chapter and Conference Paper
Extended Linear Scan: An Alternate Foundation for Global Register Allocation
In this paper, we extend past work on Linear Scan register allocation, and propose two Extended Linear Scan (ELS) algorithms that retain the compile-time efficiency of past Linear Scan algorithms while delivering...
-
Chapter and Conference Paper
Optimal Bitwise Register Allocation Using Integer Linear Programming
This paper addresses the problem of optimal global register allocation. The register allocation problem is expressed as an integer linear programming problem and solved optimally. The model is more flexible th...
-
Chapter and Conference Paper
Efficient Computation of May-Happen-in-Parallel Information for Concurrent Java Programs
Modeling of runtime threads in static analysis of concurrent programs plays an important role in both reducing the complexity and improving the precision of the analysis. Modeling based on type based technique...
-
Chapter and Conference Paper
Enhanced Bitwidth-Aware Register Allocation
Embedded processors depend on register files for performance, just like general-purpose processors in desktop and server systems. However, unlike general-purpose processors, the power consumption of register f...
-
Chapter and Conference Paper
An Efficient Algorithm to Compute Delay Set in SPMD Programs
We present compiler analysis for single program multiple data (SPMD) programs that communicate through shared address space. The choice of memory consistency model is sequential consistency as defined by Lampo...