# A Comparative Analysis of High-Speed Digital Test Techniques Manoj Sac hdev and Mansour Shashaani Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ont., N2L 3G1 msachdev@ece.uwaterloo.ca, mshashaa@vlsi.uwaterloo.ca #### Abstract Testing of high performance integrated circuits is becoming increasingly a challenging task owing to higher clock frequencies and non availability/economics of VLSI testers. In this article, we outline a DFT strategy such that high performance devices can be tested on relatively low performance testers. Various implementations aspects of this technique are also addressed. #### 1 Introduction The clock speed of advance CMOS VLSI devices have surpassed 1 GHz barrier. The Semiconductor Industry Association (SIA) roadmap for semiconductors 1997, expects even more aggressive increase in clock frequencies for future CMOS VLSI generations. Though, high speed processors are enabling application in many diverse fields, however, at the same time, their testing and reliability is identified as one of the most important challenges for VLSI testing [1]. This is primarily due to the VLSI tester's inability to keep pace with VLSI clock frequency. Historically, testers had a timing accuracy of 5X over the state of the art ICs. As a result, performance testing was a non-issue. Since then, IC clock frequencies have improved on an average 30% per year while tester accuracy has improved only on an average 12% per year. If this trend continues, in coming few years, tester timing inaccuracies will approach the cycle time of the state of the art ICs. Long before such a situation arises, yield loss due to insufficient accuracy of the tester will become unacceptably high [1]. Furthermore, even today, a high frequency tester costs a fortune (\$ 4-5 Million) and its cost is expected to increase further in years to come. In less than a decade, a state of the art tester may cost more than \$20 Million. Furthermore, test cost is a non linear function of the device under test (DUT) frequency and increases significantly with DUT frequency. SIA roadmap predicts that cost of testing a die will be surpass its manufacturing cost in near future. A recently concluded Sematech study reported significantly large number of timing only failures. These failures did not influence the circuit logic functionality, and hence were not detected by slow speed SA based or functional tests. Authors identified these failures as significant concerns for future technologies [2]. ## 2 Review and Motivation Manufacturing defects are segregated into catastrophic and non-catastrophic categories. Catastrophic defects influence IC topology significantly such that their influence is noticeable even at lower clock frequency. The impact of non catastrophic or parametric defects is subtle and often means such as $I_{DDQ}$ , burn-in, and performance testing are employed to uncover them. There is general consensus among experts that large number of such defects lead to reliability failures in the field. Although $I_{DDQ}$ testing and burn-in are very effective, their limitanes are becoming prominent as we march into deep sub-micron regime. We expect high performance testing to play significant role not only in ensuring the device specifications but also in device reliability. There are several ways for high performance testing. Agrawal and Chakraborty [3] divided them into indirect and direct methods. The indirect methods include correlation techniques to alleviate the need for high performance testing. Often ring oscillators are used for this purpose. A ring oscillator is put on a DUT and its free running frequency provides correlation to DUT high performance behavior. Bruls used 11-stage on-chip ring oscillator as a performance indicator. The output of the ring oscillator was fed to 10-stage counter to reduce the oscillation frequency from 200 MHz to 200 kHz. Free running frequency of the counter provided correlation for DUT performance [4]. Keshvarzi et. al. [5] reported strong correlation between the $I_{DDQ}$ and the maximum operating frequency of a 32 bit microprocessor. They argued that both these parameters are fundamentally related as both are functions of channel length. This information can be used as an alternative for high performance testing. All correlation methods are probabilistic in nature and in many applications, their usage may not be acceptable. Therefore, direct methods are generally preferred. Direct methods include multiplexing of tester clock pins to extend the clock frequency range of a tester. Two or more high frequency clock signals are ORed to generate a higher frequency clock. Two or more tester pins are utilized for higher frequency Figure 1: Pulse triggered flip-flop and its clock waveforms in normal and test modes [3]. clock generation. Although, this is an attractive idea, yet, practical issues prevent from more than doubling the original clock frequency. Agrawal and Chakraborty proposed adding of quantifiable, externally controlled delay in circuits for high performance testing at relatively slow speed testers [3]. The basic idea of their scheme is illustrated in Fig.1 and Fig.2. They proposed a pulse triggered flip-flop with two operational modes. In simple terms, a dynamic latch (highlighted pass transistor in Fig. 1) was introduced within the traditional master-slave flip-flop. Three latch arrangement allowed modulation of flip-flop delay with respect to the clock pulse width. In a digital circuit, the critical path (maximum delay between two flip-flops) must satisfy temporal relation shown in Eq.1. $$T \ge PD_{FF} + PD_{CL} + T_{Setup} \tag{1}$$ Where T is the clock period, $PD_{FF}$ is the propagation delay through the flip-flop, $PD_{CL}$ is the delay through the combinational logic, $T_{Setup}$ is the setup time for flip-flops. In the normal mode a small clock pulse width offers small propagation delay through the flip-flop. However, in the test mode, increased pulse width of the clock increases the propagation delay of the flip-flop. Considering, $PD_{CL}$ and $T_{Setup}$ remain unchanged, the clock period, T, must become larger for Eq.1 to be valid. In other words, slower clock frequency is able to test critical or other paths with same timing specifications. Although, the concept of adding delay in the test mode is elegant, it has some important consequences for normal functioning of IC. Some of them are listed below: The flip-flop is converted into what is known as the pulse triggered flip-flop where the delay is controlled by the clock pulse width. Realization and propagation of a small, precise pulse width Figure 2: CMOS implementation of pulse triggered flip-flop. over a complex VLSI for normal mode operation is quite a difficult task. - 2. The pass transistor inside the pulse triggered flip-flop is used as a dynamic latch. Dynamic latch makes the flip-flop operation sensitive. As the pulse width increases, the output of the dynamic flip-flop for increasingly larger amount of time remains in high impedance state. Alternatively, one can put a completely static latch, however, its cost in terms of hardware and delay is prohibitive. - An additional delay gets added to the flip-flop propagation delay in the functional mode because of the dynamic latch. - Generation of small pulse width on a tester which does not have high frequency capability is also limited. In this article, we revisit the issue of high speed testing incorporating controllable delays in flip-flops. We consider several flip-flop configurations and evaluate their normal mode and test mode behaviors. # 3 Flip-flop as a Controlled Delay Element As mentioned before, the most significant implementation issue in pulse triggered flip-flop is realization and propagation of precise pulse width at the chip level. A small pulse width needed for high speed normal operation may appear significantly distorted due to interconnect impedance. We propose to alleviate this problem with additional input in flip-flops. We called these flip-flops as controlled delay flip-flops (CDFF). CDFF differs significantly in concept as well as in the implementation detail from the pulse triggered flip-flop. These differences are crucial and become apparent to the reader subsequently in the article. Fig. 3 illustrates block diagram and gate level implementation of a CDFF. The CDFF has an additional input, Test clock. The slave latch receives clock Figure 3: First controlled delay flip-flop to facilitate high speed testing; concept (a), and its gate level implementation (b). that is logical ANDed of normal clock with the Test clock. Such an arrangement allows master to slave data transfer depending on rising edge of the Test clock. In the test mode, propagation delay through the flip-flop is controlled with the Test clock. Since propagation delay can be controlled, we call them controlled delay flip-flops. The significance of the Test clock is further illustrated with the help of Fig. 4. This figure depicts normal mode and test mode timing diagrams of an arbitrary digital circuit with CDFFs. In normal mode, the Test clock has no function and is held high ensuring normal flip-flop operation (Fig.4(a)). However, during the testing of an IC, it operates as a clock. The Test clock has tester controlled offset with respect to the normal clock. The Test clock goes to all, or a pre-determined subset of flip-flops in an IC. This clock when active, controls the data transfer from master to slave latch in flip-flops. In other words, depending on the timing relationship between the clock and the Test clock, a delay is introduced between master and slave latches of the flip-flop. The net effect is that, flip-flop output, Q, appears after an additional delay which is the offset between clock and the Test clock. The Fig. 4(b) illustrates the scenario when the Test clock is active. Figure 4: Timing diagram of normal operation (a), and the test mode operation (b). In this condition, the EQ.1 is modified as follows: $$T_{TM} \ge PD_{FF} + PD_{CL} + T_{Setup} + T_{Offset}$$ (2) $T_{Offset}$ is the offset between clock and the Test clock. The test mode clock period, $T_{TM}$ , should be large enough to accommodate all delay terms listed in Eq. 2. It is clear from this equation that as the offset is increased the period of the clock is also increased or the clock frequency is reduced. In other words, the clock frequency can be reduced while the combinational circuit delays are tested with same delay margins. Although, this implementation of CDFF requires additional transistors and an additional input, it is more realistic to realize and has better timing performance. Figure 5: Second CDFF implementation. #### 3.1 Second CDFF Implementation Fig. 5 depicts second CDFF implementation. In this CDFF, two transmission gate pairs (TGs) are added. The first TG is added between master to slave path while the second one is added in the feedback path of the slave. Both TGs are controlled by the test clock. Addition of the first TG is obvious as it controls the master to slave data transfer. The need for the second TG can be explained as follows: In a situation when clock is high and the Test clock is low, the output of CDFF is not driven. High on clock forces TG of the slave feedback to be in high impedance state. At the same time, low on Test clock forces high impedance on TG it controls. Adding another TG in the feedback path which is controlled by the Test clock makes sure that Q is always in a driven state. #### 3.2 Performance Ev aluation Timing performance of a flip-flop is characterized by data setup time $(t_{su})$ , hold time $(t_h)$ and the propagation delay $(t_{pd})$ . The setup time is defined as the time before which the data should be stable with respect to the edge of the clock. Similarly, the hold time is defined as the time after which the data should be stable with respect to the edge of the clock. The propagation delay $(t_{pd})$ of a flip-flop is defined as the elapsed time between signals clock and output Q. It is calculated from the time instance when the active edge of the clock reaches VDD/2 to the time instance when Q reaches VDD/2. For determination of the setup time, a given flip-flop is initially simulated with relaxed setup time. Subsequently, data is changed successively closer to the active edge of the clock while the output of the flip-flop is kept under observation. At the instance when the output of the flip-flop fails to register the change in input data, the time difference between input data and the clock edge is considered to be its setup time. This time difference is calculated from midpoints (VDD/2) of these signals. The hold time of a given flip-flop is also calculated similarly. Initially, the flip-flop is simulated with relaxed hold time. Subsequently, after the active edge of the clock, data is changed successively closer to the active edge while the output of the flipflop is observed. At the instance when the output of the flip-flop fails to register the change in input data, the time difference between the clock edge and data edge is considered to be its hold time. To compare performances of CDFFs, pulse triggered flip-flops, and conventional flip-flop, they were characterized for setup and hold time and propagation delays. For this comparative study, it is ensured that all flip-flops have same transistor dimensions and only transistors of similar dimensions were added in the CDFFs and pulse triggered flip-flops. Transistor sizes are selected to be representative of the technology and the design style. For this analysis, we selected a standard 0.5 $\mu$ single poly, double metal technology. However, no extra effort is made to particularly optimize flip-flops for power, performance, area, etc. All the flip-flop configurations are circuit simulated in Cadence environment. Here, we must stress that for this comparative study, the selection of absolute transistor parameters or flip-flop optimization for performance is of minor consequences. It is the relative performance of the flip-flop configurations that is of crucial importance to quantify the impact of proposed flip-flop configurations on its timing performance. In this study, five flip-flops were simulated. These flip-flops include pulse triggered flip-flop (Fig. 1), CMOS pulse triggered flip-flop (Fig. 2), CDFF1 (Fig. 3), CDFF2 (Fig. 4) and a conventional flip-flop from a $0.5~\mu$ commercial CMOS library. Fig. 6 illustrates comparison of no load propagation delays as a function of clock and data rise time. The propagation delays of conventional flip-flops and CDFF1 and CDFF2 are small and are comparable to each other. Furthermore, their respective propagation delays are not a strong function of data and clock rise times. On the other hand, pulse triggered flip-flops exhibit relatively large propagation delays which increase linearly with rise and fall times of the clock and data. High speed on-chip clocks do not have very sharp rise and fall times owing to high fanout and large, distributed impedance of clock interconnect. Furthermore, the interconnect impedance is expected to increase significantly with scaling of CMOS technology [6]. Pulse triggered flip-flops do show greater variation in propagation delay. As clock rise and fall times are increased, the pulse width of the clock is also increased. The increased pulse width results in larger propagation delay for pulse triggered flip-flops. Furthermore, contrary to a normal flip-flop or CDFF, pulse triggered flip-flops make use of both rising and falling edges of clock, therefore, they show greater variation in their propagation delay as rise and fall time vary. In other words, pulse triggered flip-flops have limitations for high performance IC applications. Fig. 7 illustrates, comparison of flip-flop propagation delay as a function of output load. As it is apparent from the graph the propagation delays of Figure 6: Comparison of propagation delays of pulse triggered, CDFF and conventional flip-flops. Figure 7: Comparison of flip-flop propagation delays as a function of output load. pulse triggered flip-flops are quite large compared to the conventional flip-flop or CDFFs. As mentioned before, delay through pulse triggered flip-flops is a function of clock pulse width. A minimum of approximately 200 ps pulse width is required for functioning of pulse triggered flip-flops. The pulse triggered flip-flop does not respond to a pulse width smaller than 200 ps. The simulation on pulse triggered flip-flops were carried out with keeping pulse width of 200 ps in order to have lowest propagation delay. For all simulations the clock and data rise and fall times were kept as 500 ps. Fig. 8 and Fig. 9 illustrate setup and hold times respectively, as a function of clock and data rise and fall times. CDFFs and pulse triggered CMOS flip-flop have relatively small setup times. The library flip-flop exhibits the largest setup time among all flip-flops. The hold times of flip-flops track their respective setup times. As a consequence, hold times of CDFFs and pulse triggered CMOS flip-flop are less negative. Although the illustrated graphs in figures were plotted with data 0, the flip-flops do not show any significant data dependency. Simulations were also carried out with data 1, however, owing to space limitations they are not included here. #### 3.3 Implemen tation Issues The implementation of CDFF requires an additional input and 6 additional transistors. It is relatively large overhead for a flip-flop implemented with 18 or 20 transistors. However, not all flip-flops in an IC are in the critical path where such measures are needed. Cost of a transistor with scaling is reducing dramatically. Furthermore, transistors can be scaled much more aggressively than the interconnects. Therefore, it is often interconnects, pads, etc. that determine the chip size. In many applications cost of few extra transistors per flip-flop may be acceptable so long it reduces the cost of testing and manufacturing. Arguably replacing normal flip-flops with CDFFs may cause some performance loss. Ho wever, our simulations do not show a significant difference between a normal flip-flop from cell library and CDFFs. Contrary to expectations, CDFF1 shows lower propagation delay compared to the conventional flip-flop. However, such a comparison has its shortcomings. The internal details of the conventional flip-flop are not available owing to proprietary nature of the cell library. In spite of lack of details, these simulations do show realization of CDFFs with comparable performance. CDFF implementation also requires an additional signal, Test clock, which may cost additional overhead. Alternatively, two phase clocking schemes such as Level Sensitive Scan Design (LSSD) [7] can also be used for high performance testing. However, LSSD is not widely used owing to its higher overhead compared to scan path technique and its additional normal mode power consumption associated with the two phase clocks. Power consumption is a serious concern for high performance ICs where clock related power consumption is significantly large contributor to the overall power consumption. #### 4 Future Directions A silicon implementation of CDFF in an existing circuit will answer some of the implementation issues discussed above. A building block, such as ISCAS sequential benchmark circuits, with and without CDFFs will be implemented in $0.35\,\mu$ technology. Figure 8: Comparison of setup times. Furthermore, we expect CDFFs to play a major role in performance binning of ICs. The effectiveness of CDFF will also be evaluated with silicon implement ation. However, the planned silicon implementation is beyond the scope of this article and will be reported subsequently. Usage of CDFFs in Built-In Self Test (BIST) environment is a natural extension of the work proposed in this article. The Test clock can be controlled through the BIST controller so as to automate the performance binning of ICs. #### 5 Conclusion High performance testing is fast becoming one of the major concerns for VLSI testing community Traditional performance edge of VLSI tester over DUT is fast disappearing. As a result, testers may not be able to distinguish between good and faulty devices. Inability of the tester will result into severe yield losses for high performance ICs. At the same time, owing to limitations of $I_{DDQ}$ testing in deep sub-micron, need for performance testing will increase for not only ensuring DUT specifications but also for ensuring the DUT reliability. In this article, we have demonstrated a DFT technique for high performance testing that require testing to be performed at significantly lower clock frequency while ensuring the timing specifications of the DUT. Addition of user controllable delays in flip-flops is the key element of this technique. Different types of flip-flops have been evaluated and their performance impact is compared. The proposed CDFFs do show potential for high performance testing. Figure 9: Comparison of hold times. # Acknowledgment Authors gratefully acknowledge support of NSERC operating grant (205034-98) for this work. ### References - Semiconductor Industry Association (SIA) Roadmap for Semiconductors, 1997. - [2] P. Nigh et al., "So What is an Optimal T est Mix? A Discussion of The Sematech Methods Experiment," Proceedings of International Test Conference, 1997, pp. 1037-1038. - [3] V.D. Agrawal and T.J. Chakraborty, "High-Performance Circuit Testing with Slow-Speed Testers," Proceedings of International Test Conference, 1995, pp. 302-310. - [4] E. Bruls, "Variable supply voltage testing for analogue CMOS and biploar circuits, Proceedings of International Test Conference, 1994, pp. 562-571. - [5] A. Keshavarzi, K. Roy and C.F. Hawkins, "Intrinsic Leakage in Low Power Deep Submicron CMOS ICs, Proceedings of International Test Conference, 1997, pp. 146-155. - [6] T. Lee and J. Cong, "The New Line in IC Design," IEEE Spectrum, March 1997, pp. 52-58. - [7] E.B. Eichelberger and T.W. Williams, "A Logic Design Structure for LSI Testability," Journal of Design Automation and Fault Tolerant Computing, vol. 2, no. 2, may 1978, pp. 165-178.