A Methodology to Characterize Critical Section Bottlenecks in DSM Multiprocessors

Sahelices, Benjamín; Ibáñez, Pablo; Viñals, Víctor; Llabería, J. M.

doi:10.1007/978-3-642-03869-3_17

Benjamín Sahelices¹⁷,
Pablo Ibáñez¹⁸,
Víctor Viñals¹⁸ &
…
J. M. Llabería¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5704))

Included in the following conference series:

European Conference on Parallel Processing

1196 Accesses
4 Citations

Abstract

Understanding and optimizing the synchronization operations of parallel programs in distributed shared memory multiprocessors (dsm), is one of the most important factors leading to significant reductions in execution time.

This paper introduces a new methodology for tuning performance of parallel programs. We focus on the critical sections used to assure exclusive access to critical resources and data structures, proposing a specific dynamic characterization of every critical section in order to a) measure the lock contention, b) measure the degree of data sharing in consecutive executions, and c) break down the execution time, reflecting the different overheads that can appear. All the required measurements are taken using a multiprocessor simulator with a detailed timing model of the processor and memory system.

We propose also a static classification of critical sections that takes into account how locks are associated with their protected data. The dynamic characterization and the static classification are correlated to identify key critical sections and infer code optimization opportunities (e.g. data layout), which when applied can lead to significant reductions in execution time (up to 33 % in the SPLASH-2 scientific benchmark suite). By using the simulator we can also evaluate whether the performance of the applied code optimizations is sensitive to common hardware optimizations or not.

This work was supported in part by Diputación General de Aragón grant “gaZ: Grupo Consolidado de Investigación”, Spanish Ministry of Education and Science grants TIN2007-66423, TIN2007-60625, Consolider CSD2007-00050, and the european HiPEAC-2 NoE.

Download to read the full chapter text

Chapter PDF

Scalable performance analysis method for SPMD applications

Article Open access 20 June 2022

A Tool for Runtime Analysis of Performance and Energy Usage in NUMA Systems

Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems

Article 08 February 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Fernández, R., García, J.: rsim x86:a cost-effective performance simulator. In: Proc. 19th European Conference on Modelling and Simulation ECMS (2005)
Google Scholar
Pai, V., Ranganathan, P., Adve, S.: rsim reference manual version 1.0. Technical report 9705, Dept. Electrical and Computer Eng., Rice University (1997)
Google Scholar
Marathe, J., Mueller, F.: Source-code-correlated cache coherence characterization of openmp benchmarks. IEEE Transactions on Parallel and Distributed Systems 18(6), 818–834 (2007)
Article Google Scholar
Eggers, S.J., Jeremiassen, T.: Eliminating false sharing. In: Proc. Int. Conf. Parallel Processing, vol. I, pp. 377–381 (1991)
Google Scholar
Kagi, A., Burger, D., Goodman, J.: Efficient synchronization: let them eat qolb. In: Proc. 24th ISCA, pp. 170–180 (1997)
Google Scholar
Torrellas, J., Lam, M., Hennessy, J.: False sharing and spatial locality in multiprocessor caches. IEEE Trans. Computers 43(6), 651–663 (1994)
Article MATH Google Scholar
Gharachorloo, K., Gupta, A., Hennessy, J.: Two techniques to enhance the performance of memory consistency models. In: Proc. ICPP, pp. 355–364 (1991)
Google Scholar
Michael, M., Nanda, A.: Design and performance of directory caches for scalable shared memory multiprocessors. In: Proc. 5th HPCA (1999)
Google Scholar
Woodacre, M., Robb, D., Roe, D., Feind, K.: The SGI altix 3000 global shared-memory architecture. White paper silicon graphics inc., SGI (2003)
Google Scholar
Anderson, T.: The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel and Distrib. Systems 1(1), 6–16 (1990)
Article Google Scholar
Graunke, G., Thakkar, S.: Synchronization algorithms for shared memory multiprocessors. IEEE Computer 23(6), 60–69 (1990)
Article Google Scholar
Mellor-Crummey, J., Scott, M.: Algorithms for scalable synchronization on shared memory multiprocessors. ACM Trans. Computer Systems 9(1), 21–65 (1991)
Article Google Scholar
Laudon, J., Lenoski, D.: The sgi origin: A cc-numa highly scalable server. In: Proc. 24th ISCA (1997)
Google Scholar
Woo, S., et al.: The splash-2 programs: Characterization and methodological considerations. In: Proc. 22th ISCA, pp. 24–36 (1995)
Google Scholar
Acacio, M., González, J., García, J., Duato, J.: Owner prediction for accelerating cache-to-cache transfer misses in a cc-numa architecture. In: Proc. 16th Int. Conf. on Supercomputing (2002)
Google Scholar
Browne, S., Dongarra, J., Garner, N., London, K., Mucci, P.: A scalable cross-platform infrastructure for application performance tuning using hardware counters. In: ACM/IEEE Supercomputing Conference, p. 42 (2000)
Google Scholar
De Rose, L., Reed, D.: Svpablo: A multi-language architecture-independent performance analysis system. In: Int. Conf. Parallel Processing, pp. 311–318 (1999)
Google Scholar
Mellor-Crummey, J., Fowler, R., Whalley, D.: Tools for application-oriented performance tuning. In: Proc. 15th Int. Conf. Supercomput, pp. 154–165 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Depto. de Informática, Univ. de Valladolid, Spain
Benjamín Sahelices
Depto. de Informática e Ing. de Sistemas, I3A and HiPEAC, Univ. de Zaragoza, Spain
Pablo Ibáñez & Víctor Viñals
Depto. de Arquitectura de Computadores., Univ. Polit. de Cataluña, Spain
J. M. Llabería

Authors

Benjamín Sahelices
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Ibáñez
View author publications
You can also search for this author in PubMed Google Scholar
Víctor Viñals
View author publications
You can also search for this author in PubMed Google Scholar
J. M. Llabería
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software Technology, Delft University of Technology, Mekelweg 4, 2628, Delft, CD, The Netherlands
Henk Sips , Dick Epema & Hai-**ang Lin , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sahelices, B., Ibáñez, P., Viñals, V., Llabería, J.M. (2009). A Methodology to Characterize Critical Section Bottlenecks in DSM Multiprocessors. In: Sips, H., Epema, D., Lin, HX. (eds) Euro-Par 2009 Parallel Processing. Euro-Par 2009. Lecture Notes in Computer Science, vol 5704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03869-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-03869-3_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03868-6
Online ISBN: 978-3-642-03869-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Methodology to Characterize Critical Section Bottlenecks in DSM Multiprocessors

Abstract

Chapter PDF

Similar content being viewed by others

Scalable performance analysis method for SPMD applications

A Tool for Runtime Analysis of Performance and Energy Usage in NUMA Systems

Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Methodology to Characterize Critical Section Bottlenecks in DSM Multiprocessors

Abstract

Chapter PDF

Similar content being viewed by others

Scalable performance analysis method for SPMD applications

A Tool for Runtime Analysis of Performance and Energy Usage in NUMA Systems

Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation