Abstract
The rate of improvement in the single-thread performance of conventional central processing units (CPUs) has decreased significantly over the last decade. This is mainly due to the difficulties in obtaining higher clock frequencies. As a consequence, the focus of development has shifted to multi-threaded execution models and multi-core CPU designs instead. Unfortunately, there are still many important algorithms and applications that cannot easily be rewritten to take advantage of this new computing paradigm. Thus, the performance gap between parallelizable algorithms and those depending on single-thread performance has widened significantly. Application-specific hardware accelerators with optimized pipelines are able to provide improved single-thread performance but have only limited flexibility and require high development effort compared to programming software-programmable processors (SPPs).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aho AV, Lam MS et al (2006) Compilers: principles, techniques, and tools, 2nd edn. Prentice Hall, New Jersey
Budiu M, Goldstein SC (2003) Optimizing memory accesses for spatial computation. In: Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03, IEEE Computer Society, Silver Spring, MD, pp 216–227
Burtscher M, Zorn BG et al (2002) Hybrid load-value predictors. IEEE Trans Comput 51:759–774
Callahan TJ, Hauser JR et al (2000) The Garp architecture and C compiler. IEEE Comput 33(4):62–69
Gädke-Lütjens H (2011) Dynamic scheduling in high-level compilation for adaptive computers. Ph.D. thesis, Technical University Braunschweig
González J, González A (1999) Limits of instruction level parallelism with data value speculation. In: International conference on vector and parallel processing. VECPAR ’98, Springer, London, UK, pp 452–465
Scale Compiler Group (2006) Scale. A scalable compiler for analytical experiments. Department of Computer Science University of Massachusetts, http://www.cs.utexas.edu/users/cart/Scale/
Guo Z, Najjar W et al (2008) Efficient hardware code generation for FPGAs. ACM Trans. on Architecture and Code Optimization (TACO) 5(1):1–26
Hennessy JL, Patterson DA (2003) Computer architecture: a quantitative approach, 3rd edn. Morgan Kaufmann Publishers, San Francisco, CA, USA
Isen C, John LK et al (2009) A tale of two processors: revisiting the RISC-CISC debate. In: Proceedings of SPEC Benchmark Workshop, pp 57–76
Jouppi NP (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings of the 17th annual international symposium on computer architecture, ISCA ’90, ACM, New York, NY, USA, pp 364–373
Kaeli D, Yew P-C (2005) Speculative execution in high performance computer architectures. CRC Press, Boca Raton, FL
Kumar S, Pires L et al (2000) A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions. In: FPGA, ACM, New York, NY, USA, pp 126–134
Lange H, Koch A (2007) An execution model for hardware/software compilation and its system-level realization. In: International conference on field programmable logic and applications (FPL), 2007, pp 285–292
Lange H, Koch A (2010) Architectures and execution models for hardware/software compilation and their system-level realization. IEEE Trans Comput 59(10):1363–1377
Lange H, Wink T et al (2011) MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers. In: 2011 Conference on design, automation & test in Europe (DATE)
Lee C, Potkonjak M et al (1997) MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: Proceedings of 30th annual IEEE/ACM international symposium on microarchitecture, 1997, pp 330–335
Lipasti MH, Wilkerson CB et al (1996) Value locality and load value prediction. ACM, New York, NY, USA, 31(9):138–147
McNairy C, Soltis D (2003) Itanium 2 processor microarchitecture. IEEE Micro 23:44–55
Micheli GD (1994) Synthesis and optimization of digital circuits, 1st edn. McGraw-Hill Higher Education, New York, USA
Mock M, Villamarin R et al (2005) An empirical study of data speculation use on the intel itanium 2 processor. In: Proceedings of workshop on interaction between compilers and computer architectures, IEEE Computer Society, Washington, DC, USA, pp 22–33
Putnam A, Bennett D et al (2008) CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures. In: 2008 international conference on field programmable logic and applications (FPL), pp 173–178
Sazeides Y, Smith JE (1997) The predictability of data values. In: Proceedings of international symposium on microarchitecture, MICRO 30. IEEE Computer Society, Washington, DC, USA, pp 248–258
Thielmann B, Huthmann J et al (2011) Evaluation of speculative execution techniques for high-level language to hardware compilation. In: 6th international workshop on reconfigurable communication-centric systems-on-chip (ReCoSoC) 2011, pp 1–8
Thielmann B, Huthmann J et al (2011) Precore—a token-based speculation architecture for high-level language to hardware compilation. In: 2011 international conference on field programmable logic and applications (FPL), pp 123–129
Thielmann B, Wink T et al (2011) RAP: More efficient memory access in highly speculative execution on reconfigurable adaptive computers. In: 2011 international conference on reconfigurable computing and FPGAs (ReConFig)
Wang K, Franklin M (1997) Highly accurate data value prediction using hybrid predictors. In: Proceedings 30th annual IEEE/ACM international symposium on microarchitecture, 1997, pp 281–290
Weaver G, Cahoon B et al (1997) Common language encoding form (clef) design document. Technical report, Department of Computer Science, University of Massachusetts
Yeh T-Y, Patt YN (1992) Alternative implementations of two-level adaptive branch prediction. In: Proceedings of the 19th annual international symposium on computer architecture, pp 124–134
Acknowledgements
This work was supported by the German national research foundation DFG and by **linx Inc.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Thielmann, B., Huthmann, J., Wink, T., Koch, A. (2013). Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms. In: Athanas, P., Pnevmatikatos, D., Sklavos, N. (eds) Embedded Systems Design with FPGAs. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1362-2_1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1362-2_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1361-5
Online ISBN: 978-1-4614-1362-2
eBook Packages: EngineeringEngineering (R0)