Abstract
With the scaling of semiconductor process technology, the performance of modern VLSI chips improves significantly. However, the aggressive technology scaling poses serious challenges to lifetime reliability. Two of the paramount challenges are soft errors and aging-driven lifetime reliability. Although many studies have been done to tackle the two challenges, most take them separately so far, thereby failing to reach better performance-cost trade-offs. To achieve an optimum performance-cost trade-off, we propose a unified fault detection scheme—stability violation-based fault detection (SVFD). Besides, since the performance of modern VLSI chips improves significant, the on-chip path delay measurement techniques have been gained many attentions for researchers in recent years, for it can provide a cost-effective alternative way to perform delay defect detection and silicon debug in modern VLSI chips. Furthermore, to help to reduce hardware overheads and delay measurement time for on-chip path delay measurement, we propose a novel on-chip path delay measurement architecture, OCDM, for path delay testing and silicon debug. Since paramount challenges come from a variety of aging mechanisms that can cause gradual performance degradation of circuits. Prior work shows that such progressive degradation can be reliably detected by dedicated aging sensors, which provides a good foundation for proposing a new scheme to improve lifetime reliability. Based on our previous researches, we further propose ReviveNet, a hardware-implemented aging-aware and self-adaptive architecture. Aging awareness is realized by deploying dedicated aging sensors, and self-adaptation is achieved by employing a group of synergistic agents. Each agent implements a localized timing adaptation mechanism to tolerate aging-induced delay on critical paths.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
International Technology Roadmap for Semiconductors, July 2009. http://pulic.itrs.net.
J.R. Black. Electromigration—A brief survey and some recent results. IEEE Trans. on Electron Devices, 16(4):338–347, 1969.
A. Avellan, and W.H. Krautschneider. Impact of soft and hard breakdown on analog and digital circuits. Transactions on Device and Materials Reliability, 4(4):676–680, 2004.
A. Tiwari and J. Torrellas. Facelift: Hiding and Slowing Down Aging in Multicores. In 2008 41st IEEE/ACM International Symposium on Microarchitecture, pages 129–140, 2008.
A. Tiwari, S. R. Sarangi, and J. Torrellas. ReCycle: Pipeline Adaptation to Tolerate Process Variation. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pages 323–334, 2007.
Aseem Agarwal, David Blaauw, and Vladimir Zolotov. Statistical timing analysis for intra-die process variations with spatial correlations. In ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No. 03CH37486), pages 900–907, 2003.
Aseem Agarwal, Vladimir Zolotov, and David T Blaauw. Statistical timing analysis using bounds and selective enumeration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 22(9):1243–1260, 2003.
Mridul Agarwal, Bipul C Paul, Ming Zhang, and Subhasish Mitra. Circuit failure prediction and its application to transistor aging. In 25th IEEE VLSI Test Symposium (VTS’07), pages 277–286, 2007.
Nisar Ahmed, Mohammad Tehranipoor, and Vinay Jayaram. A novel framework for faster-than-at-speed delay test considering ir-drop effects. In Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design, pages 198–203, 2006.
Atmel Corporation. Quality & Reliability Handbook. Section 6, pages 6.1–6.5, 2004.
B. Taskin, I.S. Kourtev,. Delay insertion method in clock skew scheduling. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(4):651–663, 2006.
Hari Balachandran, Kenneth M Butler, and Neil Simpson. Facilitating rapid first silicon debug. In Proceedings. International Test Conference, pages 628–637, 2002.
B.C. Paul, K. Kunhyuk, H. Kufluoglu, M. A. Alam, K. Roy. Impact of NBTI on the temporal performance degradation of digital circuits. IEEE Electron Device Letters, 26(8):560–562, 2005.
David Blaauw, Kaviraj Chopra, Ashish Srivastava, and Lou Scheffer. Statistical timing analysis: From basic principles to state of the art. IEEE transactions on computer-aided design of integrated circuits and systems, 27(4):589–607, 2008.
S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE MICRO, 25(6):10–16, 2005.
K.A. Bowman, S.G. Duvall, and J.D. Meindl. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-State Circuits, 37(2):183–190, 2002.
C. Nagpal, R. Garg, S.P. Khatri. A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements. In 2008 Design, Automation and Test in Europe, pages 354–359, 2008.
G. Chen, K. Y. Chuah, M. F. Li, D. S. H. Chan, C. H. Ang, J. Z. Zheng, Y. **, and D. L. Kwong. Dynamic NBTI of PMOS transistors and its impact on device lifetime. In 2003 IEEE International Reliability Physics Symposium Proceedings, 2003. 41st Annual, pages 196–202, 2003.
D. Blaauw, S. Kalaiselvan, K. Lai, W.H. Ma, S. Pant, C. Tokunaga, S. Das, and D. Bull. Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 400–401, 2008.
Ramyanshu Datta, Gary Carpenter, Kevin Nowka, and Jacob A Abraham. A scheme for on-chip timing characterization. In 24th IEEE VLSI Test Symposium, pages 6–pp, 2006.
Ramyanshu Datta, Antony Sebastine, and Jacob A Abraham. Delay fault testing and silicon debug using scan chains. In Proceedings. Ninth IEEE European Test Symposium, 2004. ETS 2004, pages 46–51, 2004.
Ramyanshu Datta, Antony Sebastine, Ashwin Raghunathan, and Jacob A Abraham. On-chip delay measurement for silicon debug. In Proceedings of the 14th ACM Great Lakes symposium on VLSI, pages 145–148, 2004.
B. Doyle, P. Mahoney, E. Fetzer, and S. Naffziger. Clock distribution on a dual-core, multi-threaded itanium/spl reg/family microprocessor. In 2005 International Conference on Integrated Circuit Design and Technology, 2005. ICICDT 2005, pages 292–293, 599, 2005.
D.E. Duarte, N. Vijaykrishnan, and M.J. Irwin. A clock power model to evaluate impact of architectural and technology optimizations. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 10(6):844–855, 2002.
Michele Favalli and Cecilia Metra. Sensing circuit for on-line detection of delay faults. IEEE transactions on very large scale integration (VLSI) systems, 4(1):130–133, 1996.
**ang Fu, Huawei Li, Yu Hu, and **aowei Li. Robust test generation for power supply noise induced path delay faults. In 2008 Asia and South Pacific Design Automation Conference, pages 659–662, 2008.
**ang Fu, Huawei Li, and **aowei Li. Testable critical path selection considering process variation. IEICE transactions on information and systems, 93(1):59–67, 2010.
G. Gerosa, S. Gary, C. Dietz, Dac Pham, K. Hoover, J. Alvarez, H. Sanchez, P. Ippolito, Tai Ngo, S. Litch, J. Eno, J. Golab, N. Vanderschaaf, and J. Kahle. A 2.2 w, 80 mhz superscalar risc microprocessor. IEEE Journal of Solid-State Circuits, 29(12):1440–1454, 1994.
Swaroop Ghosh, Swarup Bhunia, Arijit Raychowdhury, and Kaushik Roy. A novel delay fault testing methodology using low-overhead built-in delay sensor. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(12):2934–2943, 2006.
Charles Hawkins, Ali Keshavarzi, and Jaume Segura. View from the bottom: nanometer technology ac parametric failures-why, where, and how to detect. In Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems, pages 267–276, 2003.
Zijian He, Tao Lv, Huawei Li, and **aowei Li. Fast path selection for testing of small delay defects considering path correlations. In 2010 28th VLSI Test Symposium (VTS), pages 3–8, 2010.
I. Sutherland, R.F. Sproull, and David Harris. Logical Effort: Designing Fast CMOS Circuits. Morgan Kaufmann, San Mateo, CA, USA, 1999.
International SEMATECH, Inc. Critical Reliability Challenges for The International Technology Roadmap for Semiconductors (ITRS). pages 1–37, March 2003.
ITRS. PROCESS INTEGRATION, DEVICES, AND STRUCTURES. 2007.
J. Abella, X. Vera, and A. Gonzalez. Penelope: The NBTI-Aware Processor. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pages 85–96, 2007.
J. Blome, S. Feng, S. Gupta, and S. Mahlke. Self-calibrating Online Wearout Detection. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pages 109–122, 2007.
J. H. Stathis. Reliability limits for the gate insulator in CMOS technology. IBM J. RES. & DEV, 46(2/3):265–265., 2002.
J. Han, J. Gao, Y. Qi, P. Jonker, and J.A.B. Fortes. Toward Hardware-Redundant, Fault-Tolerant Logic for Nanoelectronics. IEEE Design & Test of Computers, 22(4):328–339, 2005.
J. Shin, V. Zyuban, P. Bose, and T.M. Pinkston. A Proactive Wearout Recovery Approach for Exploiting Microarchitectural Redundancy to Extend Cache SRAM Lifetime. In 2008 International Symposium on Computer Architecture, pages 353–362, 2008.
M Rabaey Jan, Chandrakasan Anantha, Nikolic Borivoje, et al. Digital integrated circuits: a design perspective, 2002.
Young-** Jeon, Joong-Ho Lee, Hyun-Chul Lee, Kyo-Won **, Kyeong-Sik Min, **-Yong Chung, and H.-J. Park. A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty-cycle clock dividers for production DDR SDRAMs. IEEE Journal of Solid-State Circuits, 39(11):2087–2092, 2004.
J.M. Rabaey, A. Chandrakasan, and B. Nikolic. Digital Integrated Circuits, A design perspective, Second Edition, Chapter 9. Pearson Education Asia Limited and Tsinghua University Press, Bei**g, China, 2004.
Ben Kaczer, Robin Degraeve, Ph Roussel, and Guido Groeseneken. Gate oxide breakdown in FET devices and circuits: From nanoscale physics to system-level reliability. Microelectronics Reliability, 47(4-5):559–566, 2007.
Shunichi Kaeriyama, Mikihiro Kajita, and Masayuki Mizuno. A 1-to-2ghz 4-phase on-chip clock generator with timing-margin test capability. In 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pages 174–594, 2007.
T. Karnik, B. Bloechel, K. Soumyanath, V. De, and S. Borkar. Scaling trends of cosmic ray induced soft errors in static latches beyond 0.18/spl mu. In 2001 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No. 01CH37185), pages 61–62, 2001.
Angela Krstic and Kwang-Ting Cheng. Delay fault testing for VLSI circuits, 1st edition. Springer, Boston, MA, 1998.
Angela Krstic, **g-Jia Liou, Yi-Min Jiang, and Kwang-Ting Cheng. Delay testing considering crosstalk-induced effects. In Proceedings International Test Conference 2001 (Cat. No. 01CH37260), pages 558–567, 2001.
Bram Kruseman, Ananta K Majhi, Guido Gronthoud, and Stefan Eichenberger. On hazard-free patterns for fine-delay fault testing. In 2004 International Conference on Test, pages 213–222, 2004.
Jean Davies Lesser and John J. Shedletsky. An experimental delay test generator for LSI logic. IEEE Transactions on Computers, 29(03):235–248, 1980.
**aoyao Liang, Gu-Yeon Wei, and David Brooks. Revival: A variation-tolerant architecture using voltage interpolation and variable latency. IEEE Micro, 29(1):127–138, 2009.
M. Agarwal, et al. Optimized Circuit Failure Prediction for Aging: Practicality and Promise. In 2008 IEEE International Test Conference, pages 1–10, 2008.
M. Nicolaidis. Design for Soft Error Mitigation. IEEE Transactions on Device and Materials Reliability, 5(3):405–418, 2005.
M. Zhang, S. Mitra, T.M. Mak, N. Seifert, N.J. Wang, Q. Shi, K.S. Kim, N.R. Shanbhag, and S.J. Patel. Sequential Element Design With Built-In Soft Error Resilience. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(12):1368–1378, 2006.
TM Mak, Angela Krstic, K-T Cheng, and Li-C Wang. New challenges in delay testing of nanometer, multigigahertz designs. IEEE Design & Test of Computers, 21(3):241–248, 2004.
Mojtaba Mehrara, Mona Attariyan, Smitha Shyam, Kypros Constantinides, Valeria Bertacco, and Todd Austin. Low-cost protection for SER upsets and silicon defects. In 2007 Design, Automation Test in Europe Conference Exhibition, pages 1146–1157, 2007.
Mojtaba Mehrara and Todd Austin. Exploiting selective placement for low-cost memory protection. ACM Transactions on Architecture and Code Optimization (TACO), 5(3):1–24, 2008.
Sreekumar Menon, Adit D Singh, and Vishwani Agrawal. Output hazard-free transition delay fault test generation. In 2009 27th IEEE VLSI Test Symposium, pages 97–102, 2009.
K. Minami, M. Mizuno, H. Yamaguchi, T. Nakano, Y. Matsushima, Y. Sumi, T. Sato, H. Yamashida, and M. Yamashina. A 1 GHz portable digital delay-locked loop with infinite phase capture ranges. In 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No. 00CH37056), pages 350–351, 2000.
Pablo Montesinos, Wei Liu, and Josep Torrellas. Using register lifetime predictions to protect register files against soft errors. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’07), pages 286–296, 2007.
N. Oh, E.J. McCluskey. Error Detection by Selective Procedure Call Duplication for Low Energy Consumption. IEEE Transactions on Reliability, 51(4):392–402, 2002.
Sanil Nassif. Delay variability: sources, impacts and trends. In 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No. 00CH37056), pages 368–369, 2000.
M. Nicolaidis. Time redundancy based soft-error tolerance to rescue nanometer technologies. In Proceedings 17th IEEE VLSI Test Symposium (Cat. No. PR00146), pages 86–94, 1999.
Michael Nicolaidis. Graal: a new fault tolerant design paradigm for mitigating the flaws of deep nanometric technologies. In 2007 IEEE International Test Conference, pages 1–10, 2007.
Phil Nigh and Anne Gattiker. Test method evaluation experiments and data. In Proceedings International Test Conference 2000 (IEEE Cat. No. 00CH37159), pages 454–463, 2000.
Bipul C Paul, Kunhyuk Kang, Haldun Kufluoglu, Muhammad Ashraful Alam, and Kaushik Roy. Temporal performance degradation under NBTI: Estimation and design for improved reliability of nanoscale circuits. In Proceedings of the Design Automation & Test in Europe Conference, volume 1, pages 1–6, 2006.
Songwei Pei, Huawei Li, and **aowei Li. A low overhead on-chip path delay measurement circuit. In 2009 Asian Test Symposium, pages 145–150, 2009.
R. Doering, and Y. Nishi. Handbook of Semiconductor Manufacturing Technology (2nd edition), Chapter 30. CRC Press, Boca Raton, FL, USA, 2017.
Arijit Raychowdhury, Swaroop Ghosh, and Kaushik Roy. A novel on-chip delay measurement hardware for efficient speed-binning. In 11th IEEE International On-Line Testing Symposium, pages 287–292, 2005.
Richard Blish, Noel Durrant. Semiconductor Device Reliability Failure Models. International SEMATECH, May 2000. http://ismi.sematech.org/docubase/abstracts/3955axfr.htm.
R. Rodriguez, J.H. Stathis, and B.P. Linder. Modeling and experimental verification of the effect of gate oxide breakdown on CMOS inverters. In 2003 IEEE International Reliability Physics Symposium Proceedings, 2003. 41st Annual, pages 11–16, 2003.
S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. Parameter Variations and Impact on Circuits and Microarchitecture. DAC, pages 338–342, 2003.
S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K.S. Kim. Robust System Design with Built-In Soft-Error Resilience. IEEE Computer, 38(2):43–52, 2005.
N.V. Shenoy, R.K. Brayton, and A.L. Sangiovanni-Vincentelli. Minimum padding to satisfy short path constraints. In Proceedings of 1993 International Conference on Computer Aided Design (ICCAD), pages 156–161, 1993.
Jeonghee Shin, Victor Zyuban, Zhigang Hu, Jude A. Rivers, and Pradip Bose. A framework for architecture-level lifetime reliability modeling. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’07), pages 534–543, 2007.
P. Shivakumar, M. Kistler, S.W. Keckler, D. Burger, and L. Alvisi. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proceedings International Conference on Dependable Systems and Networks, pages 389–398, 2002.
SONY SEMICONDUCTOR. Quality and Reliability HandBook. Chapter 4, pages 120–152, Oct. 2000.
J. Srinivasan, S.V. Adve, P. Bose, and J.A. Rivers. The case for lifetime reliability-aware microprocessors. In Proceedings. 31st Annual International Symposium on Computer Architecture, pages 276–287, 2004.
Jayanth Srinivasan, Sarita V Adve, Pradip Bose, and Jude A Rivers. Exploiting structural duplication for lifetime reliability enhancement. In 32nd International Symposium on Computer Architecture (ISCA’05), pages 520–531, 2005.
Sun Microsystem Inc. OpenSPARC T1 Microarchitecture Specification. pages 1.1–10.26, August 2006.
Stephen Sunter. Bist vs. ate: Need a different vehicle? In Proceedings International Test Conference 1998 (IEEE Cat. No. 98CH36270), page 1148, 1998.
Rajeshwary Tayade and Jacob Abraham. Small-delay defect detection in the presence of process variations. Microelectronics journal, 39(8):1093–1100, 2008.
Rajeshwary Tayade and Jacob A Abraham. On-chip programmable capture for accurate path delay test and characterization. In 2008 IEEE International Test Conference, pages 1–10, 2008.
Radu Teodorescu, Jun Nakano, Abhishek Tiwari, and Josep Torrellas. Mitigating parameter variation with dynamic fine-grain body biasing. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pages 27–42, 2007.
T.N. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-Fault Recovery Using Simultaneous Multithreading. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), pages 87–98, 2002.
Ming-Chien Tsai, Ching-Hwa Cheng, and Chiou-Mao Yang. An all-digital high-precision built-in delay time measurement circuit. In 26th IEEE VLSI Test Symposium (VTS 2008), pages 249–254, 2008.
V. Zyuban, D. Brooks, V. Srinivasan, M. Gschwind, P. Bose, P.N. Strenski, and P.G. Emma. Integrated Analysis of Power and Performance For Pipelined Microprocessors. IEEE Transactions on Computers, 53(8):1004–1016, 2004.
R. Vattikonda, Wen** Wang, and Yu Cao. Modeling and minimization of PMOS NBTI effect for robust nanometer design. In 2006 43rd ACM/IEEE Design Automation Conference, pages 1047–1052, 2006.
S. Venkataraman and S.B. Drummonds. Poirot: applications of a logic fault diagnosis tool. IEEE Design & Test of Computers, 18(1):19–30, 2001.
W. Zhao, and Y. Cao. New generation of Predictive Technology Model for sub-45nm early design exploration. IEEE Trans. on Electron Devices, 53(11):2816–2823, 2006.
Wen** Wang, Vijay Reddy, Anand T. Krishnan, Rakesh Vattikonda, Srikanth Krishnan, and Yu Cao. Compact modeling and simulation of circuit reliability for 65-nm cmos technology. IEEE Transactions on Device and Materials Reliability, 7(4):509–517, 2007.
Wen** Wang, Shengqi Yang, Sarvesh Bhardwaj, Rakesh Vattikonda, Sarma Vrudhula, Frank Liu, and Yu Cao. The impact of NBTI on the performance of combinational and sequential circuits. In Proceedings of the 44th annual Design Automation Conference, pages 364–369, 2007.
**aoxiao Wang, Mohammad Tehranipoor, and Ramyanshu Datta. Path-RO: A novel on-chip critical path delay measurement under process variations. In 2008 IEEE/ACM International Conference on Computer-Aided Design, pages 640–646, 2008.
T. Xanthopoulos, D.W. Bailey, A.K. Gangwar, M.K. Gowan, A.K. Jain, and B.K. Prewitt. The design and analysis of the clock distribution network for a 1.2 GHz alpha microprocessor. In 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177), pages 402–403, 2001.
Guihai Yan, Yinhe Han, and **aowei Li. A unified online fault detection scheme via checking of stability violation. In 2009 Design, Automation Test in Europe Conference Exhibition, pages 496–501, 2009.
PETER M ZEITZOFF, JAMES A HUTCHBY, and HOWARD R HUFF. MOSFET and front-end process integration: Scaling trends, challenges, and potential solutions through the end of the roadmap. International journal of high speed electronics and systems, 12(02):267–293, 2002.
Min** Zhang, Huawei Li, and **aowei Li. Multiple coupling effects oriented path delay test generation. In 26th IEEE VLSI Test Symposium (VTS 2008), pages 383–388, 2008.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Li, X., Yan, G., Liu, C. (2023). Fault-Tolerant Circuits. In: Built-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design. Springer, Singapore. https://doi.org/10.1007/978-981-19-8551-5_2
Download citation
DOI: https://doi.org/10.1007/978-981-19-8551-5_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8550-8
Online ISBN: 978-981-19-8551-5
eBook Packages: Computer ScienceComputer Science (R0)