Skip to main content

and
  1. No Access

    Chapter and Conference Paper

    SWIRL ++ : Evaluating Performance Models to Guide Code Transformation in Convolutional Neural Networks

    Convolutional Neural Networks (CNNs) are ubiquitous in applications ranging from self-driving cars to various branches of health care. CPUs with large core counts and wide SIMD support are used in HPC clusters...

    Tharindu R. Patabandi, Anand Venkat in Languages and Compilers for Parallel Compu… (2021)

  2. No Access

    Chapter and Conference Paper

    Mozart : Efficient Composition of Library Functions for Heterogeneous Execution

    Current processor trend is to couple a commodity processor with a GPU, a co-processor, or an accelerator. To unleash the full computational power of such heterogeneous systems is a daunting task: programmers o...

    Rajkishore Barik, Tatiana Shpeisman in Languages and Compilers for Parallel Compu… (2019)

  3. No Access

    Article

    Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization

    Integrated GPU systems are a cost-effective and energy-efficient option for accelerating data-intensive applications. While these platforms have reduced overhead of offloading computation to the GPU and potent...

    Naila Farooqui, Indrajit Roy, Yuan Chen in International Journal of Parallel Programm… (2018)

  4. Chapter and Conference Paper

    Using Dynamic Compilation to Achieve Ninja Performance for CNN Training on Many-Core Processors

    Convolutional Neural Networks (CNNs) represent a class of Deep Neural Networks that is growing in importance due to their state-of-the-art performance in pattern recognition tasks in various domains, including...

    Ankush Mandal, Rajkishore Barik, Vivek Sarkar in Euro-Par 2018: Parallel Processing (2018)

  5. Chapter and Conference Paper

    Compiler-Driven Data Layout Transformation for Heterogeneous Platforms

    Modern heterogeneous systems comprise of CPU cores, GPU cores, and in some cases, accelerator cores. Each of these computational cores have very different memory hierarchies, making it challenging to efficient...

    Deepak Majeti, Rajkishore Barik in Euro-Par 2013: Parallel Processing Worksho… (2014)

  6. Chapter and Conference Paper

    Inter-iteration Scalar Replacement Using Array SSA Form

    In this paper, we introduce novel simple and efficient analysis algorithms for scalar replacement and dead store elimination that are built on Array SSA form, a uniform representation for capturing control and...

    Rishi Surendran, Rajkishore Barik, Jisheng Zhao, Vivek Sarkar in Compiler Construction (2014)

  7. No Access

    Chapter and Conference Paper

    Static Detection of Place Locality and Elimination of Runtime Checks

    Harnessing parallelism particularly for high performance computing is a demanding topic of research. Limitations and complexities of automatic parallelization have led to programming language notations wherein...

    Shivali Agarwal, RajKishore Barik in Programming Languages and Systems (2008)

  8. Chapter and Conference Paper

    Extended Linear Scan: An Alternate Foundation for Global Register Allocation

    In this paper, we extend past work on Linear Scan register allocation, and propose two Extended Linear Scan (ELS) algorithms that retain the compile-time efficiency of past Linear Scan algorithms while delivering...

    Vivek Sarkar, Rajkishore Barik in Compiler Construction (2007)

  9. No Access

    Chapter and Conference Paper

    Optimal Bitwise Register Allocation Using Integer Linear Programming

    This paper addresses the problem of optimal global register allocation. The register allocation problem is expressed as an integer linear programming problem and solved optimally. The model is more flexible th...

    Rajkishore Barik, Christian Grothoff in Languages and Compilers for Parallel Compu… (2007)

  10. No Access

    Chapter and Conference Paper

    Efficient Computation of May-Happen-in-Parallel Information for Concurrent Java Programs

    Modeling of runtime threads in static analysis of concurrent programs plays an important role in both reducing the complexity and improving the precision of the analysis. Modeling based on type based technique...

    Rajkishore Barik in Languages and Compilers for Parallel Computing (2006)

  11. Chapter and Conference Paper

    Enhanced Bitwidth-Aware Register Allocation

    Embedded processors depend on register files for performance, just like general-purpose processors in desktop and server systems. However, unlike general-purpose processors, the power consumption of register f...

    Rajkishore Barik, Vivek Sarkar in Compiler Construction (2006)

  12. No Access

    Chapter and Conference Paper

    An Efficient Algorithm to Compute Delay Set in SPMD Programs

    We present compiler analysis for single program multiple data (SPMD) programs that communicate through shared address space. The choice of memory consistency model is sequential consistency as defined by Lampo...

    Manish P. Kurhekar, Rajkishore Barik, Umesh Kumar in High Performance Computing - HiPC 2003 (2003)