News & Analysis
Variability upends designers' plans
Richard Goering
11/21/2005 10:00 AM EST
It was a "real-life chip kill" that awakened Paul Zuchowski to the dangers of not adequately modeling process variability. At last June's Design Automation Conference, Zuchowski, senior technical-staff member at IBM Microelectronics' ASIC strategy and architecture group, described a chip that failed due to a hold-time problem induced by metal variations. Chip designers thought they had constructed a clock tree with zero skew, but metal RC variations between adjacent planes caused sporadic hold-time violations.
"If metal [layers] 5 and 6 don't track, skew is induced, and that's what happened," Zuchowski said. But traditional ASIC design flows, he noted, assume that all metal layers are either fast or slow. With oxides only six to eight atoms thick, he said, defects one atom high can induce a 33 percent variability.

What Zuchowski encountered illustrates a broader problem that will reshape the way chips are designed at 90 nanometers and below. No longer can fixed device and interconnect models assume everything will function as expected. As feature sizes shrink below 90 nm, there will be an increasing need to design for variability.
The overall problem can be simply stated. "As stuff gets smaller and closer together, the performance of the design becomes more sensitive to minor changes in fabrication," said Eric Filseth, vice president of marketing for digital IC design at Cadence Design Systems Inc.
There are, however, a great many sources of variation, and chip designers are having to ponder a long and growing list. "When you consider transistor drive, you have to look at how much does gate width change, how much does length change and how much does oxide change," said Michael Campbell, vice president of engineering at Qualcomm Corp. "We're seeing pretty good variations across a die."
There are new tools and methodologies to help designers cope with variability, among them multiple-mode and corner analysis, statistical timing analysis, improved layout techniques and modeling that reflects manufacturing variations. The alternative? Guardbands so wide that chip performance is severely compromised, removing some of the incentive to go to smaller process nodes in the first place.

There are various ways of categorizing variability. Process variability covers a lot of ground, given that every shape that's fabricated can vary. And the consequences are significant. According to the founders of startup Clear Shape Technologies Inc., critical-dimension variations can cause a 25 percent variation in Ion and a 300 percent variation in Ioff.
Process variations include channel and transistor width and length, chemical mechanical polishing (CMP)-induced metal thickness variation, line edge roughness and gate oxide thickness. Small process variations can have a profound impact on timing and leakage current.
Process variations are sometimes classified as "front end," which applies to transistors, and "back end," which applies to interconnect and includes metal width and thickness. Poly critical-dimension variation is the biggest front-end concern, said Patrick Lin, chief system-on-chip architect at foundry United Microelectronics Corp. "On a 90-nm process with an effective channel length of around 50 to 60 nm, a 3-nm variation means around a 5 to 6 percent overall performance variation," Lin said. "A slight variation in channel length will have an exponential impact on leakage."
Voltage and temperature cause environmental variations that can affect the chip. Voltage thresholds and supply voltages are frequent sources of variation. Small changes in temperature can cause a huge increase, or decrease, in leakage power. A recent paper from startup Gradient Design Automation Inc. states that a 10 degrees to 30 degrees temperature gradient can cause a 5 to 10 percent voltage drop, and that a 10 degrees gradient at 1 volt will increase leakage by 58 percent.
Variations are typically classified as "random" or "systematic." It's an important distinction, because the two are handled in different ways, and not separating them out can widen statistical distributions.
Random variations include phenomena that are generally not predictable, such as dopant fluctuations in a channel. Systematic variations, such as those caused by lithography and CMP, are deterministic and can be modeled. What sometimes appears to be random is really systematic if you look at the data in the right way, said Marc Levitt, vice president of R&D for design-for-manufacturability at Cadence.
Variations may be catastrophic if they cause a chip to fail, or parametric if they cause a chip to run too slow or consume too much power. Some variations are global, meaning that everything on the chip is affected in the same way. Others are local or spatial, such as device or interconnect variations that run across the chip.
Variations are strongly influenced by lithography and by the effectiveness of optical proximity correction. Joe Sawicki, vice president of Mentor Graphics Corp.'s design-to-silicon division, noted that both the focus and the amount of light vary as features are printed-and this has a significant impact on how transistors are formed.
Variation-aware analysis
With all these sources of variation, there are numerous best-case and worst-case corners to model. One answer is multiple-mode, multiple-corner analysis. Sierra Design Automation Inc. offers such a capability with Pinnacle, a physical-synthesis tool introduced at this year's Design Automation Conference. Pinnacle claims to concurrently optimize timing, area, power and signal integrity across as many as 16 mode and corner combinations in one run, using a single timing graph.
Synopsys Inc.'s PrimeTime, however, is by far the most widely used tool for timing signoff. PrimeTime has added on-chip variation and multimode, multicorner analysis techniques during the past few years, and can now perform a "distributed multiscenario analysis" that combines results into one report, said Bill Mullen, vice president of engineering for timing and characterization at Synopsys.
Many observers, however, believe that statistical timing analysis will be needed at 65 nm and below. "Exhaustive corner analysis is exhausting," said Chandu Visweswariah, research staff member at IBM Research. "If a dozen process parameters are significant-and in reality, there are a lot more than that-then there are 212 corners that you need to check."
Visweswariah is one of the industry's best-known advocates of statistical timing, which provides statistical distributions of delays under varying process conditions. A statistical timer can also tell a designer that a given clock speed will result in an 80 percent yield, making it possible to push performance as far as possible while staying within yield margins.
IBM has developed a statistical capability for its EinsTimer timing engine, and it's now available to outside customers on a selective basis. Internally, said Visweswariah, IBM currently has a hybrid solution that combines corner-based methods with statistical. "We are marching toward a completely statistical solution, and not just timing, but also power and signal integrity," he said.
Statistical EinsTimer is a fully incremental timer that can handle a range of global and spatial process variations, both random and systematic. It provides a wide selection of textual and graphical reports, including probability density functions. "We produce reports that look exactly like the deterministic reports designers are used to, except they represent all variability and sample the distributions at six sigma," Visweswariah said.
The challenge in statistical timing is getting statistical process data-something fabs are loath to give out. As an integrated device manufacturer, IBM has this information. Outside customers, said Visweswariah, can run a one-time library characterization to bring in data from different fabs.
Startup Extreme DA is currently undergoing customer evaluations with its statistical timer, which comes with a 3-D statistical extraction engine and a sensitivity-analysis capability. During statistical timing, it can calculate the sensitivity of slack and path delays with respect to instance delays. CEO Mustafa Celik believes it's just a "matter of time" before foundries provide statistical models.
Magma Design Automation Inc. has been quiet about its Quartz SSTA statistical timer, but the product is currently installed at several customer sites, said Robert Jones, director of Magma's silicon signoff business unit. It comes with "variation-aware" extraction and uses Magma's SiliconSmart characterization tool to add statistical data to IC libraries.
Pointing to recent tapeout failures following signoff with deterministic timing analysis, Jones-like Extreme DA's Celik-is confident foundries will provide the necessary data. "Customers have failed silicon and they can't do the analysis," Jones said. However, the transition to fully statistical timing analysis will be gradual, he said, like the transition from gate-level timing simulation to static timing analysis before it.
Synopsys is planning to roll out a statistical capability for PrimeTime by the end of the year, Mullen revealed, as well as "variation-aware" extraction through the Star-RCXT product.
Statistical timing has some skeptics, and not only because of the difficulty of getting statistical process models from foundries. Shankar Krishnamoorthy, Sierra Design's CTO, noted that statistical timing is useful for predicting the probability of a chip working at a given frequency, but does not alleviate the need to analyze and optimize a design for all possible process corners.
"Even with statistical, you still have to do corner analysis for temperature and supply voltage, and you still have to make sure timing is correct at certain corners," acknowledged Extreme DA's Celik.
Walter Ng, senior director for platform alliances at foundry Chartered Semiconductor Manufacturing Ltd., cautioned that statistical models are hard to build, and that statistical timing is not a "bolt-on" solution. "It's something that goes across the whole design flow," he said.
The real problem with statistical analysis, said Atul Sharan, CEO of Clear Shape Technologies, is that it lumps random and systematic variations together. Sharan argued that systematic variations due to lithography and etch are the dominant part of the problem, and if they aren't abstracted out, the range of the statistical distributions will be too large to be meaningful.
Clear Shape is developing technology that will bring "design manufacturing check" models into the IC design flow (see Oct. 17, page 34). The company promises a fast and accurate way of predicting the impact of systematic manufacturing variations, and of bringing this information into design tools transparently. Statistical analysis, said Sharan, can then focus on random variations.
Possibly the best way to avoid problems due to variability is to create designs that minimize it in the first place. That's why Mentor Graphics is promoting "litho-friendly design," which captures process information that allows designers to improve layouts. Mentor's Sawicki said that placing transistors and contacts on a virtual grid will go a long way toward reducing variability. "A lot of stuff about variability is highly predictable. It's just not being managed in the current design process," Sawicki said.



