SAN JOSE, Calif. There was a time when engineers worried about process variations from one fab to the next, one batch to the next. But in the realm of extreme-submicron design, those worries are now dwarfed by fears of process variations across a single die. Experts gathered here last week to air out the issues in a spirited exchange at the International Symposium on Quality Electronic Design.
It's not that process variations are getting out of hand in absolute terms. Intel Fellow Steven Duvall can produce measured data indicating that the variance in gate length is in fact decreasing in standard processes as geometries grow finer. Nonetheless, he emphasized, device sensitivity to variance is increasing faster than process engineers can reduce the variance. Reduced signal levels, noise margins and timing windows are all conspiring to make previously minor variations in transistor geometry a big deal for circuit designers.
Worse still, new mechanisms are appearing that cause important variations not only in transistors but also in interconnect. And some of those mechanisms, Duvall warned, show greater variation across a single die than across similar structures on different dice from a wafer.
Thus the chip designer must expect significant and not necessarily predictable differences between transistors and between interconnect resistances on a single die.
Two panel members at the symposium illustrated specific examples of such mechanisms. MIT associate professor Duane Boning discussed the impact of chemical-mechanical polishing (CMP), a vital step in the dual-Damascene metallization process. In areas where many metal runs are close together, CMP tends to hollow out the copper. That can result in a 20 percent decrease in the thickness of metal runs, with a resulting jump in interconnect resistance, Boning said.
In a similar vein, Texas Instruments director of yield Doug Verret warned that because of the sheer number of vias on a modern die, via variations are becoming a critical problem. "We made measurements across a number of TI designs, and saw pretty consistent numbers," Verret reported. "A typical 130-nanometer chip has 108 vias per square centimeter.
"Now at that density, in order to achieve the sort of early-failure figures that our customers want, the early-failure rate for the vias has to be 0.2 parts per trillion. This is an issue, and we need all hands on deck to address it. Even if we are able to make major strides in process control, we will still almost certainly have to move to design techniques such as multiple redundant vias in order to yield working parts."
And there is more to come, claimed James Meindl, professor and director of the Microelectronics Research Center at Georgia Tech. "I believe that variations will set the ultimate limits on scaling of MOSFETs," Meindl predicted.
MOSFETs are becoming so small, he said, that the random nature of the location of dopant atoms in the transistor channel is showing up as a variation in transistor characteristics. "You can't send the dopant atoms to addresses in the lattice this is a physical limit."
So we will live in a world in which transistors and wires will vary considerably across the surface of an individual die. What are we supposed to do about it? Two panelists addressed the issue head-on.
Sani Nassif, manager of IBM's Austin Research Lab, boldly cut the Gordian knot in two with a simple distinction. "There are two issues here: variability, which can be modeled and canceled out by design techniques, and uncertainty, which can only be handled by guard-banding the worst case.
"The problem isn't the amount of variability. It's that we tend to turn variability into uncertainty by not modeling it. That means we end up vastly overguard-banding our designs."
Nassif argued that designers are unlikely to solve this problem even though they could because they are under too much stress to get designs out.
Computer-aided design vendors, on the other hand, perpetuate the problem, Nassif said. "The CAD guys only know rules. They turn our wonderful probability graphs into rules, and then your design is either right or it's wrong, and all chance of nulling out variability is lost."
Jan Rabaey, professor and co-founder of the Berkeley Wireless Research Council at the University of California, Berkeley, dismissed out of hand the idea that process control could get designers out of their fix. "There is a staggering plethora of problems. It's just wrong to think that process control alone can fix them," he stated.
Rabaey went on to describe techniques designers could use to reduce sensitivity to process variations or to "abstract them away."
"Look at interconnect," he said. "When you make a coast-to-coast phone call, you don't depend on uniform wire thickness between here and New York. You use protocols that make your transmission independent of the physical medium. We will do the same thing on ICs: protocol-based communications with error recovery instead of point-to-point wiring."
Panelists engaged in an exchange on interconnect. In response to a question about the relative importance of interconnect, Nassif said that in every microprocessor design he had examined, the critical variabilities were predominantly in the devices, not the wires. "We are handling interconnect variability adequately with rules, buffers and other tools," he said.
Boning agreed, saying that even when the interconnect had proved a problem in his work, the root could generally be traced to variations in the active devices in the interconnect rather than to the wires themselves.
But TI's Verret countered that in yield problems at his organization, interconnect issues were three times more likely than device issues to be at fault.
Another questioner commented that it was usually environmental variabilities, such as simultaneous switching or crosstalk, that caused interconnect problems, rather than variations in the static characteristics of the wires.
By the conclusion of the panel, two points of consensus had emerged. One, driven by Rabaey and endorsed by several other panelists, was that a change in design practice is necessary at the architectural level. Algorithms must be made less sensitive to device variability and implementation must be achieved with more use of regular, repetitive structures, panelists said.
The second consensus was that no one group will be able to crack the variability problem: Rather, it will require unprecedented cooperation among design, library development and process engineering a level of cooperation far more realistic within integrated device manufacturers, such as IBM or TI, than among a fabless company, its design partners and its foundry.