Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

  • Chapter
  • First Online:
Embedded Systems Design with FPGAs
  • 3256 Accesses

Abstract

The rate of improvement in the single-thread performance of conventional central processing units (CPUs) has decreased significantly over the last decade. This is mainly due to the difficulties in obtaining higher clock frequencies. As a consequence, the focus of development has shifted to multi-threaded execution models and multi-core CPU designs instead. Unfortunately, there are still many important algorithms and applications that cannot easily be rewritten to take advantage of this new computing paradigm. Thus, the performance gap between parallelizable algorithms and those depending on single-thread performance has widened significantly. Application-specific hardware accelerators with optimized pipelines are able to provide improved single-thread performance but have only limited flexibility and require high development effort compared to programming software-programmable processors (SPPs).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aho AV, Lam MS et al (2006) Compilers: principles, techniques, and tools, 2nd edn. Prentice Hall, New Jersey

    Google Scholar 

  2. Budiu M, Goldstein SC (2003) Optimizing memory accesses for spatial computation. In: Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03, IEEE Computer Society, Silver Spring, MD, pp 216–227

    Google Scholar 

  3. Burtscher M, Zorn BG et al (2002) Hybrid load-value predictors. IEEE Trans Comput 51:759–774

    Article  Google Scholar 

  4. Callahan TJ, Hauser JR et al (2000) The Garp architecture and C compiler. IEEE Comput 33(4):62–69

    Article  Google Scholar 

  5. Gädke-Lütjens H (2011) Dynamic scheduling in high-level compilation for adaptive computers. Ph.D. thesis, Technical University Braunschweig

    Google Scholar 

  6. González J, González A (1999) Limits of instruction level parallelism with data value speculation. In: International conference on vector and parallel processing. VECPAR ’98, Springer, London, UK, pp 452–465

    Google Scholar 

  7. Scale Compiler Group (2006) Scale. A scalable compiler for analytical experiments. Department of Computer Science University of Massachusetts, http://www.cs.utexas.edu/users/cart/Scale/

  8. Guo Z, Najjar W et al (2008) Efficient hardware code generation for FPGAs. ACM Trans. on Architecture and Code Optimization (TACO) 5(1):1–26

    Google Scholar 

  9. Hennessy JL, Patterson DA (2003) Computer architecture: a quantitative approach, 3rd edn. Morgan Kaufmann Publishers, San Francisco, CA, USA

    Google Scholar 

  10. Isen C, John LK et al (2009) A tale of two processors: revisiting the RISC-CISC debate. In: Proceedings of SPEC Benchmark Workshop, pp 57–76

    Google Scholar 

  11. Jouppi NP (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings of the 17th annual international symposium on computer architecture, ISCA ’90, ACM, New York, NY, USA, pp 364–373

    Google Scholar 

  12. Kaeli D, Yew P-C (2005) Speculative execution in high performance computer architectures. CRC Press, Boca Raton, FL

    Google Scholar 

  13. Kumar S, Pires L et al (2000) A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions. In: FPGA, ACM, New York, NY, USA, pp 126–134

    Google Scholar 

  14. Lange H, Koch A (2007) An execution model for hardware/software compilation and its system-level realization. In: International conference on field programmable logic and applications (FPL), 2007, pp 285–292

    Article  Google Scholar 

  15. Lange H, Koch A (2010) Architectures and execution models for hardware/software compilation and their system-level realization. IEEE Trans Comput 59(10):1363–1377

    Article  MathSciNet  Google Scholar 

  16. Lange H, Wink T et al (2011) MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers. In: 2011 Conference on design, automation & test in Europe (DATE)

    Google Scholar 

  17. Lee C, Potkonjak M et al (1997) MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: Proceedings of 30th annual IEEE/ACM international symposium on microarchitecture, 1997, pp 330–335

    Google Scholar 

  18. Lipasti MH, Wilkerson CB et al (1996) Value locality and load value prediction. ACM, New York, NY, USA, 31(9):138–147

    Google Scholar 

  19. McNairy C, Soltis D (2003) Itanium 2 processor microarchitecture. IEEE Micro 23:44–55

    Article  Google Scholar 

  20. Micheli GD (1994) Synthesis and optimization of digital circuits, 1st edn. McGraw-Hill Higher Education, New York, USA

    Google Scholar 

  21. Mock M, Villamarin R et al (2005) An empirical study of data speculation use on the intel itanium 2 processor. In: Proceedings of workshop on interaction between compilers and computer architectures, IEEE Computer Society, Washington, DC, USA, pp 22–33

    Google Scholar 

  22. Putnam A, Bennett D et al (2008) CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures. In: 2008 international conference on field programmable logic and applications (FPL), pp 173–178

    Google Scholar 

  23. Sazeides Y, Smith JE (1997) The predictability of data values. In: Proceedings of international symposium on microarchitecture, MICRO 30. IEEE Computer Society, Washington, DC, USA, pp 248–258

    Google Scholar 

  24. Thielmann B, Huthmann J et al (2011) Evaluation of speculative execution techniques for high-level language to hardware compilation. In: 6th international workshop on reconfigurable communication-centric systems-on-chip (ReCoSoC) 2011, pp 1–8

    Google Scholar 

  25. Thielmann B, Huthmann J et al (2011) Precore—a token-based speculation architecture for high-level language to hardware compilation. In: 2011 international conference on field programmable logic and applications (FPL), pp 123–129

    Google Scholar 

  26. Thielmann B, Wink T et al (2011) RAP: More efficient memory access in highly speculative execution on reconfigurable adaptive computers. In: 2011 international conference on reconfigurable computing and FPGAs (ReConFig)

    Google Scholar 

  27. Wang K, Franklin M (1997) Highly accurate data value prediction using hybrid predictors. In: Proceedings 30th annual IEEE/ACM international symposium on microarchitecture, 1997, pp 281–290

    Google Scholar 

  28. Weaver G, Cahoon B et al (1997) Common language encoding form (clef) design document. Technical report, Department of Computer Science, University of Massachusetts

    Google Scholar 

  29. Yeh T-Y, Patt YN (1992) Alternative implementations of two-level adaptive branch prediction. In: Proceedings of the 19th annual international symposium on computer architecture, pp 124–134

    Google Scholar 

Download references

Acknowledgements

This work was supported by the German national research foundation DFG and by **linx Inc.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Thielmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Thielmann, B., Huthmann, J., Wink, T., Koch, A. (2013). Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms. In: Athanas, P., Pnevmatikatos, D., Sklavos, N. (eds) Embedded Systems Design with FPGAs. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1362-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1362-2_1

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1361-5

  • Online ISBN: 978-1-4614-1362-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation