United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Managing Noise on the K6-III

Shrinking line widths demand attention to noise issues, so design methodologies must change to include comprehensive noise analysis.

By Ted Williams and Luke Tsai


Advances in fabrication technology certainly improve performance and reduce size and power consumption, but they come with intertwined disadvantages as well. These finer technologies turn noise and reliability into increasingly thorny issues. Increased clock rates and reduced line spacings make individual edge slew rates a larger percentage of the total clock cycle, and cross-coupling capacitance from potential aggressor signals becomes a larger percentage of total capacitance. These and other effects make it essential to develop and deploy a noise analysis and budgeting strategy to ensure that the chip functions over the full range of process spread.

We recently ported the AMD K6-III microprocessor design, with its on-chip 256-KByte secondary (L2) cache, from 0.25-ým to 0.18-ým technology, shrinking the die size, reducing costs, and enabling significant improvements in clock speed and power consumption. But at the same time, the migration to the finer process also meant that guaranteeing continued functionality and noise immunity would become a key design challenge.

The K6-III has well-positioned speed and performance. The 21.3-million transistor chip is manufactured on AMD's five-layer-metal process technology with local interconnect and shallow trench isolation. The processor comes in a Super7-platform-compatible, 321-pin ceramic pin grid array package using C4 flip-chip interconnect technology.

To enable its performance, the K6-III holds a large maximum combined system cache, which we call the Trilevel cache. The Trilevel cache design includes a full-speed 64-KByte Level 1 cache, an internal full-speed backside 256-KByte Level 2 cache, and a 100-MHz front side bus to an optional external Level 3 cache on the Super7 motherboard. With a total of 320 KBytes of combined L1 and L2 cache, the K6-III processor offers more internal cache memory than any other x86 CPU available today.

In addition, the K6-III contains 3Dnow, a 3D-enhancement technology that significantly parallelizes and enhances floating-point-intensive 3D graphics and multimedia applications. 3Dnow relies on single-instruction multiple data streams and other performance boosts to enhance visual computing.

Migrating the K6-III from a 0.25-ým to 0.18-ým technology did yield finer geometries, but also resulted in more transistor leakage in an already complex design. It was thus clear from the beginning that the design strategy needed to include detailed noise analysis to achieve working first silicon and predictable design schedules.

Noise no longer immune

Our new process technology has undergone the usual evolution of effects that make today's digital chip designs trickier, as they demand increasing attention to mitigate noise sensitivities and augment circuit robustness. Taller conductor aspect ratios enlarge the proportion of coupling (both capacitance and inductive) to aggressor signals, which can slow signal transitions even for ordinary static drivers and if not properly accounted for can trigger outright failure in precharged circuits. Reducing the nominal power supply voltage (in this case by 25 percent) increased the relative disturbance from those noise sources that don't scale with the supply voltage, such as transistor threshold variations, coupling from off-chip and I/O signals that stay at a fixed 3.3-V supply, alpha-particle events, and simulation inaccuracies. In addition, increased transistor currents aggravated IR-drop degradations in the power and ground supply grids.

By far the most significant change in the new technology was an increase-of almost two orders of magnitude-in the ratio of transistor on-current to off-current caused by the exponential increase in sub-threshold transistor leakage that accompanies the decreased threshold voltage. This change required increases in precharged keeper sizes and enforced a narrower range of acceptable P:N beta ratios, even for static gates. Both of these themes further blur the distinction between precharged and static circuit styles, thus reducing the possibility that any fully complementary static gate is necessarily safe or noise-immune.

All of these trends worsen the potential magnitude of noise injected onto "digital" signals-which, at such high speeds, behave much like analog signals-and make rigorous analysis the only feasible way to ensure complete compliance with design goals and guidelines.

Figure 1 - Noise Analysis
Noise analysis occurs within a design flow concurrently with static timing analysis.
If design violations slipped past, performance degradations or incorrect logic values might result. Both cases can be considered failures, as additional mask spins to correct any problems would be a tremendous setback in time to market.

Approaching noise

Dealing with noise is a relatively new task for digital designers, who until recently enjoyed protective levels of noise immunity. As we plumb the deep-submicron world of 0.25 ým and below, that security no longer exists. We have had to broaden our noise analysis to include all of the new dominant effects caused by technology shrinkage, while retaining our mastery over the more traditional noise issues of charge sharing and power-supply fluctuations.

But even experienced circuit designers, aware of the new issues, couldn't enumerate all analyses for all circuits by exhaustive circuit simulation-not within required schedules and manpower limitations, at least.

To address this difficulty, we needed new static noise analysis approaches, such as those Cadmos implemented in its Pacific tool. Just as static timing analysis tools helped designers through the complexity of considering all timing paths and obviated the need to worry about sufficient vector coverage, the new static noise analysis approach accounts for "all" possible combinations of induced and propagated noise sources.

The tool is a static analyzer that accounts for the combined effects of the relevant digital noise sources at every net in the design. It uses a built-in noise immunity metric that allowed us to focus on truly sensitive parts of the circuit and not just areas where the peak noise exceeds an arbitrarily chosen design rule. Examining every occurrence where noise exceeds a given peak typically would mean looking at hundreds or thousands of potential failures for a given macro block. The tool's built-in sensitivity filtering enabled us to focus on a handful of noise issues, saving valuable design time and avoiding unnecessary design changes. Furthermore, Pacific sidesteps additional modeling approximation steps because it uses a built-in transient simulator that deploys the original Spice parameters. Our accuracy comparisons with Spice lay within a few percent.

We applied Pacific as a point tool after post-layout extraction (see Figure 1). To prepare, we extracted the circuit netlist-including interconnect resistance and coupling capacitance parasitics-from the layout; provided Spice transistor models; and created a run control script. If a custom macro contained analog signals, such as bit lines in a memory array, we had to "black box" the circuitry containing them with a user-defined noise (UDN) model, since Pacific is a tool for digital circuits only.

The tool accepted both flat and hierarchical industry-standard netlist formats, such as Spice and DSPF, available as output from various commercial physical layout extraction tools. It also used the same transistor model, BSIM3, that our circuit simulations used. We needed to write a Tcl control script specifying the key parameters, such as voltage, temperature, I/Os, and clocks.
Figure 2 - Web page browsing
The tool outputs several different views of its analytical results in HTML form.

In practice, we had already conducted some specific noise analysis of key circuits in the K6-2. However, we expected further noise issues because of the finer line widths, higher transistor leakages, and lower voltages. Luckily, the introduction of the tool coincided with our need to do the process shrink.

Of course, while our main goal was to ensure robustness, we also gained greater confidence in pushing the process to improve performance, while still maintaining product reliability. Reliability goes hand in hand with thoroughness. If nothing else, we realized that a thorough static approach was a good step to quantify, improve, and validate product reliability. Whereas most designers perform noise analysis only on certain manually chosen critical paths of a circuit, the tool checked every circuit and every net. That made it a safety net for our designers, who otherwise could have let noise problems escape their attention.

Even though Pacific falls under the category of "static" analysis tool-it doesn't require input stimulus-it's really a hybrid because it performs transient simulation on each local circuit. The tool automatically generates the correct local stimulus to create the worst-case noise scenario. It "adds" the effects of all noise sources meant to be analyzed, on a node-by-node basis. For example, crosstalk noise caused by coupling is combined with noise caused by charge sharing to create the worst-case noise event.

Figure 3 - Starting Point
The original design for a read-column multiplexer shows how charge sharing, a form of noise, can creep into a circuit. The problem originates with a large capacitance sitting on a node in the merged pulldown network (net38).
In addition to combining noise sources-leakage, charge sharing, coupling, and so on-the tool accounts for the noise coming from preceding gates by propagating noise along all possible paths. This method presents a realistic operating environment for each individual gate in question, since assuming that the input signals are quiet may be too optimistic, while applying a fixed, arbitrary noise too pessimistic. The tool reports the noise on any net that exceeds user-defined noise sensitivity or noise peak thresholds.

The tool presents its information in the form of hierarchical HTML pages (see Figure 2), which allowed us to use a standard Web browser to view the results. The pages facilitated the results analysis by splitting the information into three main portions detailing the amount of noise and sensitivity, the stimulus applied to create such a noise, and the contributing noise sources. Using this way of looking at the data-much better organized than large textual files-we found it easy to determine the origins of noise.

False failures

Just as static timing analysis can generate "false" paths, static noise analysis can generate overly pessimistic combinations. For example, when analyzing the K6-III, we were concerned that a false failure might occur when two noise sources were combined, even though they came from logically opposite signals we knew were hazard-free. In this case the tool, assuming that both signals would switch together, reported a significant noise issue. We removed this false error by specifying the logical conditions for these signals in the Tcl run control script.
Figure 4 - Spice simulations
Spice waveform simulations show how charge sharing can cause node voltages to drop precipitously (rcol [2]), resulting in unwanted logic triggering - in this case of nand_gate[2].

In our experience, the tool made good, conservative choices. The tool flagged anything doubtful, making worst-case assumptions (until told otherwise), as in the case of the two exclusive signals. That process forced us to take explicit action to remove false negatives.

Conducting its own simulations on Spice transistor circuit models, the tool determined how each channel-connected group of transistors responded to small-signal noise changes. The tool calculated the transient sensitivity to noise for every receiver of every net, individually. This thoroughness meant that we could use the tool on existing circuit designs without introducing unnecessary pessimism that would otherwise have resulted from any "fixed" rule choice.

Mux is the word

To show how the process works, we've chosen a read column mux-a good example of a circuit with a potential charge-sharing problem, and fodder for a noise-analysis tool. The original circuit, depicted in Figure 3 ("rcolmx_nopmos"), shows an array of eight precharged gates connected to a merged pull-down network (N230). The merged pull-down network allows the series evaluation device controlled by the bypass select signal ("bypselp") to be shared, thereby reducing the loading on its driver.

Unfortunately, the merged-pull-down network ends up containing an internal node (labeled "net38") that has a significant capacitance. If this capacitance happens to be sitting at a low value because it had discharged during a prior cycle, it potentially can charge-share with one of the precharged outputs. A significant glitch can result, amplified by downstream gates, and force an erroneous logic state.

Such an occurrence appears in the first set of Hspice waveforms in Figure 4, in which the falling transition on rcolseln,2. causes a rising transition on net68,2.; that, in turn, causes net38 to charge-share with rcol,2., resulting in a dip down to 1.48 V out of a 2.0-V supply (a 26 percent degradation). The dip triggers the nand_out,2. output to glitch, as well.

The worst case occurs when only one of the select signals goes active. If more than one goes active and all charge-share together, any one of the outputs will experience a smaller degradation because the "bad" charge would be distributed rather than concentrated onto a single output victim. The degree to which the degradation propagates to the nand_out,2. output depends on that gate's input threshold (determined from transistor characteristics and its N:P beta-ratio). Consequently, it's not obvious that simple hand calculations will yield enough information to determine whether a potential failure will cause problems downstream.

Figure 5 - The way out
Adding a PMOS device precharges the intermediate node, thus preventing the degradation that charge sharing can cause.
The concept of static noise analysis assumes a propagation of noise degradation along all possible paths, just as static timing analysis propagates delays along all possible paths. In this example, to determine the noise actually propagated to the nand_out,2. output, static noise analysis compares the magnitude of the charge-sharing-induced noise on net rcol,2. with the transistor characteristics of the nand_gate,2. receiver.

If the rcol,2. degradation is severe enough that nand_gate,2. amplifies instead of attenuating it, then the user can take action to correct the problem. Figure 5 ("rcolmx_1") shows a possible fix. For our example, we add a PMOS device ("p228") to ensure that intermediate node net38 is precharged to a high value, along with all the rcol,*. signals. This precharging prevents any degradation if charge sharing occurs when one of the net68,*. signals rises.

Figure 2 shows Pacific's analysis of the same circuit. The upper right frame lists all the nets whose noise exceeds user-specified threshold. All eight internal nodes (rcol,0:7.) are vulnerable to charge-sharing noise. The column under the heading "Type" denotes the state of the net: "VH" stands for "Voltage High," and "VL" for "Voltage Low." The next column lists the peak noise calculated for this node. Combining these two columns shows that rcol,2. has a 526 mV of voltage drop from VDD. Note that this result lies within 2 percent of Hspice. The next column shows the sensitivity that the noise is causing on the subsequent net. A sensitivity of less than -1 implies that the noise is being amplified instead of attenuated, in which case the designer should correct the problem.

The lower left frame lists the stimuli that produce the noise. The tool set up the worse-case condition, where one select signal turned on (net68,2., 0 (R), "R" for rising) and the rest remained inactive. To observe the charge sharing, the tool also set net67, in the gate of transistor N230, to low. The frame at lower right presents the type and the magnitude of the noise. The noise largely stems from charge sharing.

Although this example illustrates only a charge-sharing failure, the static noise analysis method sums noise from many sources, and therefore can quantify combinations that could induce a problem. For instance, another important noise source is the charge leaked away because of transistor subthreshold conduction. In the "rcolmx_nopmos" schematic, if devices n213 and n230 were many times larger (wider) than transistor p217, then charge would drain from rcol,2. (because of the subthreshold leakage, which is additive to the charge-sharing degradation described for net38). Even after the addition of the p228 device, the leakage still causes some degradation, but for the device sizes in this schematic, the noise analysis confirms that such degradation is tolerable.

A few surprises

The tool uncovered some sensitivities in the custom macros that the circuit designers hadn't yet identified, though chances are that the designers would have rectified the situation before tapeout. The tool didn't suggest fixes but did point out which noise types contributed the most to the problem, thus prioritizing the design effort and possible improvements. For instance, when some located noise stemmed from cross coupling, we knew we could take steps-such as widening the metal spacing-to alleviate the situation. If the major contributor to a noise problem were a charge-sharing noise then we could have added intermediary keeper devices.

Although the 256-KByte L2 cache is the biggest block in the K6-III, it wasn't the most challenging in terms of circuitry. The large circuit stretched the tool's capacity more than the algorithm at the heart of the tool. But it was actually the 4-KByte first-level (L1) cache (which runs at the full clock rate of the chip and deploys all kinds of circuit tricks, including self-resetting logic paths, to achieve its performance) that challenged the viability of the Pacific algorithm. Nevertheless, the tool handled all of these circuit design styles. Its runtime and memory usage were also reasonable. The run times averaged 150,000 transistors/hour on a 300-MHz Sun Ultra2, and the largest block, the L2 cache, took 250 Mbytes of memory.

It was essential for us to deal efficiently with noise analysis to ensure robustness and reliability within the product market window. With these techniques in place, the K6-III design worked as expected in first silicon. For us, Pacific has proven its worth as a back-end detailed noise immunity checker for the K6-III, and has also served the designers of the AMD Athlon. We plan its continued use for our next generations of high-speed processors, including the K8. In the future however, we would like to use the tool earlier in the design cycle-before layout-where it can validate our circuit designs, using estimates for mimicking capacitive coupling effects.


Ted Williams is the director of silicon implementation at Morphics Technology in Campbell, CA. Previously, he managed the VLSI Circuits Tools team for AMD's California Microprocessor Division. He has fifteen years of engineering, management, and teaching experience in VLSI design and EDA.

Luke Tsai is a member of the K8 microprocessor tool development and circuit design teams at AMD. He has designed circuits at AMD since 1992, serving from 1995 to 1999 as a member of the K6 microprocessor team, where he began developing tools and methodologies for the circuit group.

To voice an opinion on this or any other article in Integrated System Design, please e-mail your comments to mikem@isdmag.com.


Send electronic versions of press releases to news@isdmag.com
For more information about isdmag.com e-mail webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
Anita Borg Institute Honors 3 Women
Group Honors Three Women For Contributions To Tech

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2010 EE Times Group, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About