Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

Thielmann, Benjamin; Huthmann, Jens; Wink, Thorsten; Koch, Andreas

doi:10.1007/978-1-4614-1362-2_1

Benjamin Thielmann⁴,
Jens Huthmann⁴,
Thorsten Wink⁴ &
…
Andreas Koch⁴

3256 Accesses

Abstract

The rate of improvement in the single-thread performance of conventional central processing units (CPUs) has decreased significantly over the last decade. This is mainly due to the difficulties in obtaining higher clock frequencies. As a consequence, the focus of development has shifted to multi-threaded execution models and multi-core CPU designs instead. Unfortunately, there are still many important algorithms and applications that cannot easily be rewritten to take advantage of this new computing paradigm. Thus, the performance gap between parallelizable algorithms and those depending on single-thread performance has widened significantly. Application-specific hardware accelerators with optimized pipelines are able to provide improved single-thread performance but have only limited flexibility and require high development effort compared to programming software-programmable processors (SPPs).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Runtime-Aware Architectures

Conclusions

A Dynamic Cache Architecture for Efficient Memory Resource Allocation in Many-Core Systems

References

Aho AV, Lam MS et al (2006) Compilers: principles, techniques, and tools, 2nd edn. Prentice Hall, New Jersey
Google Scholar
Budiu M, Goldstein SC (2003) Optimizing memory accesses for spatial computation. In: Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03, IEEE Computer Society, Silver Spring, MD, pp 216–227
Google Scholar
Burtscher M, Zorn BG et al (2002) Hybrid load-value predictors. IEEE Trans Comput 51:759–774
Article Google Scholar
Callahan TJ, Hauser JR et al (2000) The Garp architecture and C compiler. IEEE Comput 33(4):62–69
Article Google Scholar
Gädke-Lütjens H (2011) Dynamic scheduling in high-level compilation for adaptive computers. Ph.D. thesis, Technical University Braunschweig
Google Scholar
González J, González A (1999) Limits of instruction level parallelism with data value speculation. In: International conference on vector and parallel processing. VECPAR ’98, Springer, London, UK, pp 452–465
Google Scholar
Scale Compiler Group (2006) Scale. A scalable compiler for analytical experiments. Department of Computer Science University of Massachusetts, http://www.cs.utexas.edu/users/cart/Scale/
Guo Z, Najjar W et al (2008) Efficient hardware code generation for FPGAs. ACM Trans. on Architecture and Code Optimization (TACO) 5(1):1–26
Google Scholar
Hennessy JL, Patterson DA (2003) Computer architecture: a quantitative approach, 3rd edn. Morgan Kaufmann Publishers, San Francisco, CA, USA
Google Scholar
Isen C, John LK et al (2009) A tale of two processors: revisiting the RISC-CISC debate. In: Proceedings of SPEC Benchmark Workshop, pp 57–76
Google Scholar
Jouppi NP (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings of the 17th annual international symposium on computer architecture, ISCA ’90, ACM, New York, NY, USA, pp 364–373
Google Scholar
Kaeli D, Yew P-C (2005) Speculative execution in high performance computer architectures. CRC Press, Boca Raton, FL
Google Scholar
Kumar S, Pires L et al (2000) A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions. In: FPGA, ACM, New York, NY, USA, pp 126–134
Google Scholar
Lange H, Koch A (2007) An execution model for hardware/software compilation and its system-level realization. In: International conference on field programmable logic and applications (FPL), 2007, pp 285–292
Article Google Scholar
Lange H, Koch A (2010) Architectures and execution models for hardware/software compilation and their system-level realization. IEEE Trans Comput 59(10):1363–1377
Article MathSciNet Google Scholar
Lange H, Wink T et al (2011) MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers. In: 2011 Conference on design, automation & test in Europe (DATE)
Google Scholar
Lee C, Potkonjak M et al (1997) MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: Proceedings of 30th annual IEEE/ACM international symposium on microarchitecture, 1997, pp 330–335
Google Scholar
Lipasti MH, Wilkerson CB et al (1996) Value locality and load value prediction. ACM, New York, NY, USA, 31(9):138–147
Google Scholar
McNairy C, Soltis D (2003) Itanium 2 processor microarchitecture. IEEE Micro 23:44–55
Article Google Scholar
Micheli GD (1994) Synthesis and optimization of digital circuits, 1st edn. McGraw-Hill Higher Education, New York, USA
Google Scholar
Mock M, Villamarin R et al (2005) An empirical study of data speculation use on the intel itanium 2 processor. In: Proceedings of workshop on interaction between compilers and computer architectures, IEEE Computer Society, Washington, DC, USA, pp 22–33
Google Scholar
Putnam A, Bennett D et al (2008) CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures. In: 2008 international conference on field programmable logic and applications (FPL), pp 173–178
Google Scholar
Sazeides Y, Smith JE (1997) The predictability of data values. In: Proceedings of international symposium on microarchitecture, MICRO 30. IEEE Computer Society, Washington, DC, USA, pp 248–258
Google Scholar
Thielmann B, Huthmann J et al (2011) Evaluation of speculative execution techniques for high-level language to hardware compilation. In: 6th international workshop on reconfigurable communication-centric systems-on-chip (ReCoSoC) 2011, pp 1–8
Google Scholar
Thielmann B, Huthmann J et al (2011) Precore—a token-based speculation architecture for high-level language to hardware compilation. In: 2011 international conference on field programmable logic and applications (FPL), pp 123–129
Google Scholar
Thielmann B, Wink T et al (2011) RAP: More efficient memory access in highly speculative execution on reconfigurable adaptive computers. In: 2011 international conference on reconfigurable computing and FPGAs (ReConFig)
Google Scholar
Wang K, Franklin M (1997) Highly accurate data value prediction using hybrid predictors. In: Proceedings 30th annual IEEE/ACM international symposium on microarchitecture, 1997, pp 281–290
Google Scholar
Weaver G, Cahoon B et al (1997) Common language encoding form (clef) design document. Technical report, Department of Computer Science, University of Massachusetts
Google Scholar
Yeh T-Y, Patt YN (1992) Alternative implementations of two-level adaptive branch prediction. In: Proceedings of the 19th annual international symposium on computer architecture, pp 124–134
Google Scholar

Download references

Acknowledgements

This work was supported by the German national research foundation DFG and by **linx Inc.

Author information

Authors and Affiliations

Embedded Systems and Applications Group, Technische Universität Darmstadt, FB20 (Informatik), FG ESA, Hochschulstr. 10, 64289, Darmstadt, Germany
Benjamin Thielmann, Jens Huthmann, Thorsten Wink & Andreas Koch

Authors

Benjamin Thielmann
View author publications
You can also search for this author in PubMed Google Scholar
Jens Huthmann
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Wink
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Koch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Thielmann .

Editor information

Editors and Affiliations

Bradley Dept. Electrical &, Computer Engineering, Virginia Tech, Whittemore Hall 302, BLACKSBURG, 24061, Virgin Islands, USA
Peter Athanas
Technical University of Crete, Crete, Greece
Dionisios Pnevmatikatos
Technological Ed Institute of Patras, Patras, Greece
Nicolas Sklavos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Thielmann, B., Huthmann, J., Wink, T., Koch, A. (2013). Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms. In: Athanas, P., Pnevmatikatos, D., Sklavos, N. (eds) Embedded Systems Design with FPGAs. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1362-2_1

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1362-2_1
Published: 01 November 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1361-5
Online ISBN: 978-1-4614-1362-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Runtime-Aware Architectures

Conclusions

A Dynamic Cache Architecture for Efficient Memory Resource Allocation in Many-Core Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Runtime-Aware Architectures

Conclusions

A Dynamic Cache Architecture for Efficient Memory Resource Allocation in Many-Core Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation