## A 90nm 6.5GHz 256x64b Dual Supply Register File with Split Decoder Scheme

Steven Hsu, Bhaskar Chatterjee\*, Manoj Sachdev\*, Atila Alvandpour, Ram K. Krishnamurthy, Shekhar Borkar

Circuits Research, Intel Labs Intel Corporation, Hillsboro, OR 97124, USA

## Abstract

This paper describes a 256x64b 2-read, 1-write ported static register file for 6.5GHz operation in 1.2V, 90nm CMOS. Read/write select drivers and decoder use 0.9V lower supply to reduce total energy by 23%. Local/global bitlines use a leakage-tolerant split-decoder scheme with conditional precharge to achieve 65% (90%) higher DC robustness compared to conventional static (dynamic) bitline scheme.

## Introduction

Wide bit-width register files are performance-critical components of processor integer/FPU execution cluster and demand single cycle read/write latency and dense organization. Aggressive V, scaling has resulted in poor bitline robustness scaling, requiring alternate leakage-tolerant bitline schemes. Static pass-transistor bitline techniques have been proposed to improve active leakage and noise tolerance [1]. However, deselected static bitlines (when all read-select signals are "low") still suffer from degraded robustness due to bitlines being held weakly via static keepers. A 256x64b static register file in 1.2V, 90nm CMOS [2] is described for 6.5GHz operation. Lower supply  $(V_{ce}=0.9V)$  is used on the address decoder and read/write select drivers to reduce active leakage energy without explicit level converter stage or sacrificing bitcell stability. A split-decoder with conditional precharge scheme is used to enable 8 bitcells/bitline with improved leakage tolerance and full-swing noise recovery. Single-ended read/write-select and bit-line signaling is used throughout, enabling dense layout occupying 1000µmx800µm (Fig. 8).

RF organization Fig. 1 shows organization of the 2-read, 1-write ported 256wordx64-bits/word register file. 8-bit read/write address per port is decoded via a two-level split-decoder in previous cycle and fed as read/write-select (RS/WS) signals into two segmented 256x32b arrays. Fig. 2 shows the register file bitcell, with symmetric loading of 1 read port on each side of storage cell for optimal cell stability. Matched pass transistors on each side of the storage cell enable single-ended write [1]. Fig. 3(a) shows the split-decoder, which uses a first level 3:8 decoder to generate 8 unique bank-enable signals BE<7:0> for 8 second level 5:32 address decoders [3]. Each 256x32-bit array is partitioned into 8 banks of 32-words/bank, triggered by a one-hot 5:32 decoder. Fig. 3(b)-(c) shows the local and global bitline (LBL/GBL) scheme. Each LBL (1 per port) supports single-ended read on 8 bitcells with 4-way merge via C<sup>2</sup>MOS column-mux. Data from storage cell is read by singleended pass transistors followed by full-swing restoring PMOS keeper gain-stage. GBL merges the 8 column-mux outputs via two static mux stages to deliver a 64b word per read-port.

**Dual Supply Operation** 

Fig. 4(a) shows the total energy breakup at 1.2V, 110°C for the register file during read operation, which dominates power consumption. Active leakage energy contribution is 83% of total energy due to most of the entries being inactive during normal operation, and address decoder and RS/WS drivers energy contribution is 43%. To reduce active leakage, address decoder and RS/WS drivers are operated at lower V<sub>cc</sub>=0.9V. Rest of the register file operates at nominal 1.2V so that bitcell stability is unaffected. Low- and high- $V_{\infty}$  layout domains are well partitioned (Fig. 8), avoiding low-swing coupling noise issues. DC power at the low/high-V<sub>cc</sub> interface is avoided since the low-V<sub>cc</sub> RS/WS signals drive only gates of NMOS pass-transistors. No explicit level converter is required since

\*Dept. of Electrical & Computer Engineering Univ. of Waterloo, Ontario N2L3G1, Canada

LBL restores full-swing outputs. Fig.4(b) shows device-level 100nm, 80°C measurements of total leakage current (subthreshold + gate) with lowering Vce, showing exponential trend: 46% lower leakage for 1.2→0.9V.

DC Noise Robustness and Energy-Delay Results

LBL and GBL stages are susceptible to noise due to high active leakage of pass-transistors when they are deselected (RS=0) during normal read operation. Upsizing the swingrestoring keeper offers limited robustness benefit, which diminishes with scaling [4]. To improve leakage tolerance, full-swing conditional precharge signals (LCP<31:0> for LBL and BE<7:0> for GBL) are generated from the split-decoder, which trigger PMOS sustainers that statically anchor deselected LBLs/GBLs strongly to V<sub>cc</sub> (Fig. 5). BE<7:0> and the column-select signals are locally level-converted before feeding the conditional precharge circuit to avoid DC power. Register file's timing plan ensures early arrival of the splitdecoder signals before RS/WS signals to minimize contention short-circuit power. Performance penalty of additional diffusion capacitance of the PMOS sustainers is <1%, since bitline capacitance is dominated by pass-transistor diffusion and interconnects. PMOS sustainers layout is folded into swing-restoring keeper and GBL static stage layout templates to minimize area penalty. LCP and BE signal wires are routed on upper metal layers in-plane with RS/WS signals to avoid array area growth and reduce coupling. Table 1 shows DC robustness comparisons of proposed design vs. conventional static pass-transistor and dynamic bitline schemes optimized for high-performance. Worst-case leakage and input DC noise conditions are setup for each, and DC robustness is evaluated as unity-gain noise margin at LBL/GBL outputs. For the deselected bitline case, the proposed scheme achieves 65% (90%) higher robustness than conventional static (dynamic) bitline schemes. With one RS entry enabled and AC noise on remaining RS signals, the proposed scheme fully recovers (T<sub>50%</sub>=23ps) when noise is damped, due to actively being driven by bitcell drivers via pass-transistors. Fig. 6 shows 90nm, 110°C worst-case read cycle-time and total energy behavior of the proposed design. Complete register file operates at 7.3GHz at nominal 1.2V. At V<sub>cc</sub>=0.9V on decoder and RS/WS drivers, the register file achieves 6.5GHz operation (14% read delay increase) with 550mW total power. Decoder and RS/WS drivers active leakage energy is 67% lower, resulting in a total register file energy reduction of 23%. Simulations include layout-extracted parasitics, with maximum LBL and GBL lengths of 500µm and 280µm.

Fig. 7 shows performance-robustness scaling of proposed vs. conventional static and dynamic schemes to 65nm and 45nm. projected from [2] using scaling models described in [4]. The proposed scheme sustains both performance and robustness benefits, showing good scaling trend to sub-90nm generations.

Conclusion

A 90nm 6.5GHz 256x64b dual supply register file with a leakage-tolerant split-decoder scheme offers 23% total energy and 65% robustness benefit, and good sub-90nm scaling trend.

Acknowledgement
The authors thank C. Webb, N. Saxena, S. Vangal for discussions; B. Bloechel for measurements; M. Haycock, J. Rattner for encouragement and support.

- References
  [1] S. Vangal et al, 2002 ISSCC Digest, pp. 412-413.
  [2] S.Thompson et al, 2002 IEDM Tech. Digest, pp. 61-64.
  [3] N. Tzartzanis et al, 2002 ISSCC Digest, pp. 416-417.
  [4] M. Anders et al, 2001 VLSI Circuits Symp. Digest, pp. 23-24.

