Advances in fabrication technology certainly improve
performance and reduce size and power consumption, but they come with intertwined disadvantages as well. These finer technologies turn noise and reliability into increasingly thorny issues. Increased clock rates and reduced line spacings make individual edge slew rates a larger percentage of the total clock cycle, and cross-coupling capacitance from potential aggressor signals becomes a larger percentage of total capacitance. These and other effects make it essential to develop and deploy a noise analysis and
budgeting strategy to ensure that the chip functions over the full range of process spread.
We recently ported the AMD K6-III microprocessor design, with its on-chip 256-KByte secondary (L2) cache, from 0.25-ým to 0.18-ým technology, shrinking the die size, reducing costs, and enabling significant improvements in clock speed and power consumption. But at the same time, the migration to the finer process also meant that guaranteeing continued functionality and noise immunity would become a key design
challenge.
The K6-III has well-positioned speed and performance. The 21.3-million transistor chip is manufactured on AMD's five-layer-metal process technology with local interconnect and shallow trench isolation. The processor comes in a Super7-platform-compatible, 321-pin ceramic pin grid array package using C4 flip-chip interconnect technology.
To enable its performance, the K6-III holds a large maximum combined system cache, which we call the Trilevel cache. The Trilevel cache design includes a
full-speed 64-KByte Level 1 cache, an internal full-speed backside 256-KByte Level 2 cache, and a 100-MHz front side bus to an optional external Level 3 cache on the Super7 motherboard. With a total of 320 KBytes of combined L1 and L2 cache, the K6-III processor offers more internal cache memory than any other x86 CPU available today.
In addition, the K6-III contains 3Dnow, a 3D-enhancement technology that significantly parallelizes and enhances floating-point-intensive 3D graphics and multimedia
applications. 3Dnow relies on single-instruction multiple data streams and other performance boosts to enhance visual computing.
Migrating the K6-III from a 0.25-ým to 0.18-ým technology did yield finer geometries, but also resulted in more transistor leakage in an already complex design. It was thus clear from the beginning that the design strategy needed to include detailed noise analysis to achieve working first silicon and predictable design schedules.
Noise no longer immune
Our new process
technology has undergone the usual evolution of effects that make today's digital chip designs trickier, as they demand increasing attention to mitigate noise sensitivities and augment circuit robustness. Taller conductor aspect ratios enlarge the proportion of coupling (both capacitance and inductive) to aggressor signals, which can slow signal transitions even for ordinary static drivers and if not properly accounted for can trigger outright failure in precharged circuits. Reducing the nominal power supply
voltage (in this case by 25 percent) increased the relative disturbance from those noise sources that don't scale with the supply voltage, such as transistor threshold variations, coupling from off-chip and I/O signals that stay at a fixed 3.3-V supply, alpha-particle events, and simulation inaccuracies. In addition, increased transistor currents aggravated IR-drop degradations in the power and ground supply grids.
By far the most significant change in the new technology was an increase-of almost two
orders of magnitude-in the ratio of transistor on-current to off-current caused by the exponential increase in sub-threshold transistor leakage that accompanies the decreased threshold voltage. This change required increases in precharged keeper sizes and enforced a narrower range of acceptable P:N beta ratios, even for static gates. Both of these themes further blur the distinction between precharged and static circuit styles, thus reducing the possibility that any fully complementary static gate is
necessarily safe or noise-immune.
All of these trends worsen the potential magnitude of noise injected onto "digital" signals-which, at such high speeds, behave much like analog signals-and make rigorous analysis the only feasible way to ensure complete compliance with design goals and guidelines.
|
Figure 1 - Noise Analysis
|
|
|
Noise analysis occurs within a design flow concurrently with static timing analysis.
|
If design violations slipped past, performance degradations or incorrect logic values might result. Both cases can be considered failures, as additional mask spins to correct any problems would be a tremendous setback in time to market.
Approaching noise
Dealing with noise is a relatively new task for digital designers, who until
recently enjoyed protective levels of noise immunity. As we plumb the deep-submicron world of 0.25 ým and below, that security no longer exists. We have had to broaden our noise analysis to include all of the new dominant effects caused by technology shrinkage, while retaining our mastery over the more traditional noise issues of charge sharing and power-supply fluctuations.
But even experienced circuit designers, aware of the new issues, couldn't enumerate all analyses for all circuits by exhaustive
circuit simulation-not within required schedules and manpower limitations, at least.
To address this difficulty, we needed new static noise analysis approaches, such as those Cadmos implemented in its Pacific tool. Just as static timing analysis tools helped designers through the complexity of considering all timing paths and obviated the need to worry about sufficient vector coverage, the new static noise analysis approach accounts for "all" possible combinations of induced and propagated noise sources.
The tool is a static analyzer that accounts for the combined effects of the relevant digital noise sources at every net in the design. It uses a built-in noise immunity metric that allowed us to focus on truly sensitive parts of the circuit and not just areas where the peak noise exceeds an arbitrarily chosen design rule. Examining every occurrence where noise exceeds a given peak typically would mean looking at hundreds or thousands of potential failures for a given macro block. The tool's built-in
sensitivity filtering enabled us to focus on a handful of noise issues, saving valuable design time and avoiding unnecessary design changes. Furthermore, Pacific sidesteps additional modeling approximation steps because it uses a built-in transient simulator that deploys the original Spice parameters. Our accuracy comparisons with Spice lay within a few percent.
We applied Pacific as a point tool after post-layout extraction (see Figure 1). To prepare, we extracted the circuit netlist-including interconnect
resistance and coupling capacitance parasitics-from the layout; provided Spice transistor models; and created a run control script. If a custom macro contained analog signals, such as bit lines in a memory array, we had to "black box" the circuitry containing them with a user-defined noise (UDN) model, since Pacific is a tool for digital circuits only.
The tool accepted both flat and hierarchical industry-standard netlist formats, such as Spice and DSPF, available as output from various commercial
physical layout extraction tools. It also used the same transistor model, BSIM3, that our circuit simulations used. We needed to write a Tcl control script specifying the key parameters, such as voltage, temperature, I/Os, and clocks.
|
Figure 2 - Web page browsing
|
|
|
The tool outputs several different views of its analytical results in HTML form.
|
In practice, we had already conducted some specific noise analysis of key circuits in the K6-2. However, we expected further noise issues because of the finer line widths, higher transistor leakages, and lower voltages. Luckily, the introduction of the tool coincided with our need to do the process shrink.
Of course, while our main goal was to ensure robustness,
we also gained greater confidence in pushing the process to improve performance, while still maintaining product reliability. Reliability goes hand in hand with thoroughness. If nothing else, we realized that a thorough static approach was a good step to quantify, improve, and validate product reliability. Whereas most designers perform noise analysis only on certain manually chosen critical paths of a circuit, the tool checked every circuit and every net. That made it a safety net for our designers, who
otherwise could have let noise problems escape their attention.
Even though Pacific falls under the category of "static" analysis tool-it doesn't require input stimulus-it's really a hybrid because it performs transient simulation on each local circuit. The tool automatically generates the correct local stimulus to create the worst-case noise scenario. It "adds" the effects of all noise sources meant to be analyzed, on a node-by-node basis. For example, crosstalk noise caused by coupling is combined with
noise caused by charge sharing to create the worst-case noise event.
|
Figure 3 - Starting Point
|
|
|
The original design for a read-column multiplexer shows how charge sharing, a form of noise, can creep into a circuit. The problem originates with a large capacitance
sitting on a node in the merged pulldown network (net38).
|
In addition to combining noise sources-leakage, charge sharing, coupling, and so on-the tool accounts for the noise coming from preceding gates by propagating noise along all possible paths. This method presents a realistic operating environment for each individual gate in question, since assuming that the input signals are quiet may be too optimistic, while applying a fixed, arbitrary noise too pessimistic. The tool reports the
noise on any net that exceeds user-defined noise sensitivity or noise peak thresholds.
The tool presents its information in the form of hierarchical HTML pages (see Figure 2), which allowed us to use a standard Web browser to view the results. The pages facilitated the results analysis by splitting the information into three main portions detailing the amount of noise and sensitivity, the stimulus applied to create such a noise, and the contributing noise sources. Using this way of looking at the data-much
better organized than large textual files-we found it easy to determine the origins of noise.
False failures
Just as static timing analysis can generate "false" paths, static noise analysis can generate overly pessimistic combinations. For example, when analyzing the K6-III, we were concerned that a false failure might occur when two noise sources were combined, even though they came from logically opposite signals we knew were hazard-free. In this case the tool, assuming that both signals
would switch together, reported a significant noise issue. We removed this false error by specifying the logical conditions for these signals in the Tcl run control script.
|
Figure 4 - Spice simulations
|
|
|
Spice waveform
simulations show how charge sharing can cause node voltages to drop precipitously (rcol [2]), resulting in unwanted logic triggering - in this case of nand_gate[2].
|
In our experience, the tool made good, conservative choices. The tool flagged anything doubtful, making worst-case assumptions (until told otherwise), as in the case of the two exclusive signals. That process forced us to take explicit action to remove false negatives.
Conducting its own simulations on Spice transistor
circuit models, the tool determined how each channel-connected group of transistors responded to small-signal noise changes. The tool calculated the transient sensitivity to noise for every receiver of every net, individually. This thoroughness meant that we could use the tool on existing circuit designs without introducing unnecessary pessimism that would otherwise have resulted from any "fixed" rule choice.
Mux is the word
To show how the process works, we've chosen a read column mux-a good
example of a circuit with a potential charge-sharing problem, and fodder for a noise-analysis tool. The original circuit, depicted in Figure 3 ("rcolmx_nopmos"), shows an array of eight precharged gates connected to a merged pull-down network (N230). The merged pull-down network allows the series evaluation device controlled by the bypass select signal ("bypselp") to be shared, thereby reducing the loading on its driver.
Unfortunately, the merged-pull-down network ends up containing an internal node
(labeled "net38") that has a significant capacitance. If this capacitance happens to be sitting at a low value because it had discharged during a prior cycle, it potentially can charge-share with one of the precharged outputs. A significant glitch can result, amplified by downstream gates, and force an erroneous logic state.
Such an occurrence appears in the first set of Hspice waveforms in Figure 4, in which the falling transition on rcolseln,2. causes a rising transition on net68,2.; that, in turn,
causes net38 to charge-share with rcol,2., resulting in a dip down to 1.48 V out of a 2.0-V supply (a 26 percent degradation). The dip triggers the nand_out,2. output to glitch, as well.
The worst case occurs when only one of the select signals goes active. If more than one goes active and all charge-share together, any one of the outputs will experience a smaller degradation because the "bad" charge would be distributed rather than concentrated onto a single output victim. The degree to which the
degradation propagates to the nand_out,2. output depends on that gate's input threshold (determined from transistor characteristics and its N:P beta-ratio). Consequently, it's not obvious that simple hand calculations will yield enough information to determine whether a potential failure will cause problems downstream.
|
Figure 5 - The way out
|
|
|
Adding a PMOS device precharges the intermediate node, thus preventing the degradation that charge sharing can cause.
|
The concept of static noise analysis assumes a propagation of noise degradation along all possible paths, just as static timing analysis propagates delays along all possible paths. In this example, to determine the noise actually propagated to the nand_out,2. output, static noise analysis compares the
magnitude of the charge-sharing-induced noise on net rcol,2. with the transistor characteristics of the nand_gate,2. receiver.
If the rcol,2. degradation is severe enough that nand_gate,2. amplifies instead of attenuating it, then the user can take action to correct the problem. Figure 5 ("rcolmx_1") shows a possible fix. For our example, we add a PMOS device ("p228") to ensure that intermediate node net38 is precharged to a high value, along with all the rcol,*. signals. This precharging prevents any
degradation if charge sharing occurs when one of the net68,*. signals rises.
Figure 2 shows Pacific's analysis of the same circuit. The upper right frame lists all the nets whose noise exceeds user-specified threshold. All eight internal nodes (rcol,0:7.) are vulnerable to charge-sharing noise. The column under the heading "Type" denotes the state of the net: "VH" stands for "Voltage High," and "VL" for "Voltage Low." The next column lists the peak noise calculated for this node. Combining these two
columns shows that rcol,2. has a 526 mV of voltage drop from VDD. Note that this result lies within 2 percent of Hspice. The next column shows the sensitivity that the noise is causing on the subsequent net. A sensitivity of less than -1 implies that the noise is being amplified instead of attenuated, in which case the designer should correct the problem.
The lower left frame lists the stimuli that produce the noise. The tool set up the worse-case condition, where one select signal turned on (net68,2., 0
(R), "R" for rising) and the rest remained inactive. To observe the charge sharing, the tool also set net67, in the gate of transistor N230, to low. The frame at lower right presents the type and the magnitude of the noise. The noise largely stems from charge sharing.
Although this example illustrates only a charge-sharing failure, the static noise analysis method sums noise from many sources, and therefore can quantify combinations that could induce a problem. For instance, another important noise
source is the charge leaked away because of transistor subthreshold conduction. In the "rcolmx_nopmos" schematic, if devices n213 and n230 were many times larger (wider) than transistor p217, then charge would drain from rcol,2. (because of the subthreshold leakage, which is additive to the charge-sharing degradation described for net38). Even after the addition of the p228 device, the leakage still causes some degradation, but for the device sizes in this schematic, the noise analysis confirms that such
degradation is tolerable.
A few surprises
The tool uncovered some sensitivities in the custom macros that the circuit designers hadn't yet identified, though chances are that the designers would have rectified the situation before tapeout. The tool didn't suggest fixes but did point out which noise types contributed the most to the problem, thus prioritizing the design effort and possible improvements. For instance, when some located noise stemmed from cross coupling, we knew we could take
steps-such as widening the metal spacing-to alleviate the situation. If the major contributor to a noise problem were a charge-sharing noise then we could have added intermediary keeper devices.
Although the 256-KByte L2 cache is the biggest block in the K6-III, it wasn't the most challenging in terms of circuitry. The large circuit stretched the tool's capacity more than the algorithm at the heart of the tool. But it was actually the 4-KByte first-level (L1) cache (which runs at the full clock rate of the
chip and deploys all kinds of circuit tricks, including self-resetting logic paths, to achieve its performance) that challenged the viability of the Pacific algorithm. Nevertheless, the tool handled all of these circuit design styles. Its runtime and memory usage were also reasonable. The run times averaged 150,000 transistors/hour on a 300-MHz Sun Ultra2, and the largest block, the L2 cache, took 250 Mbytes of memory.
It was essential for us to deal efficiently with noise analysis to ensure robustness and
reliability within the product market window. With these techniques in place, the K6-III design worked as expected in first silicon. For us, Pacific has proven its worth as a back-end detailed noise immunity checker for the K6-III, and has also served the designers of the AMD Athlon. We plan its continued use for our next generations of high-speed processors, including the K8. In the future however, we would like to use the tool earlier in the design cycle-before layout-where it can validate our circuit
designs, using estimates for mimicking capacitive coupling effects.
Ted Williams is the director of silicon implementation at Morphics Technology in Campbell, CA. Previously, he managed the VLSI Circuits Tools team for AMD's California Microprocessor Division. He has fifteen years of engineering, management, and teaching experience in VLSI design and EDA.
Luke Tsai is a member of the K8 microprocessor tool development and circuit design teams at AMD. He has designed
circuits at AMD since 1992, serving from 1995 to 1999 as a member of the K6 microprocessor team, where he began developing tools and methodologies for the circuit group.
To voice an opinion on this or any other article in Integrated System Design, please e-mail your comments to mikem@isdmag.com.
Send electronic versions of press releases to
news@isdmag.com
For more information about
isdmag.com e-mail
webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000
Integrated System Design
Magazine