LAS VEGAS Formal verification techniques are making their way into design flows for the most complex new chips, managers at Intel Corp. and Motorola Inc. told a Design Automation Conference audience here Wednesday (June 20). But the methodology still has a way to go before it's an everyday tool.
"The truth is that formally verifying that the entire design implements the specification is not generally possible today," said Ken Albin, manager of design verification for embedded-platform solutions at Motorola's Semiconductor Products Sector (Austin,Texas).
Intel developed its own tools for debugging the Pentium 4 processor, the first project of its kind to which the chip giant applied formal verification on a large scale, said Bob Bentley, manager of presilicon validation for CPU design at Intel's operation in Hillsboro, Ore.
But the recent proliferation of commercial verification tools, Bentley said, has led Intel's validation team to consider their use for its next-generation processors.
Using formal verification, Bentley said, the Intel team found more than 100 "high-quality" logic bugs that would not have been uncovered any other way.
Bentley and Albin shared their experiences in verifying the latest-generation chip designs with a DAC audience concerned about the bottleneck that verification has become. "There is as much to be learned from the successes and failures of others in the validation area as in other, more richly documented, fields of computer engineering," Bentley said.
At the same time, both men touched on the skills needed by verification engineers, and Fusun Ozguner, professor at the Ohio State University department of electrical engineering, reported on a program that lays out a career path for verification engineers separate from design engineering.
Bentley described how Intel applied a number of innovative tools and methodologies that kept validation off the critical path to tapeout while ensuring that first silicon was functional enough to boot operating systems and run applications. "Post-silicon validation teams were able to quickly 'peel the onion,' resulting in an elapsed time of only 10 months from initial tapeout to production shipment qualification an Intel record for a new IA-32 microarchitecture," he said.
Structural register-transfer-level work began in late 1996 at the cluster level, said Bentley, while initial samples were shipped to customers in the first quarter of 2000. The Pentium 4 processor was launched last November at frequencies of 1.4 and 1.5 GHz.
"We had a nucleus of 10 people who had worked on the previous Pentium Pro processor, and who could do the initial planning for the Pentium 4 project," Bentley said. "But it was clear that 10 people were nowhere near enough for a 42 million-transistor design that ended up requiring more than 1 million lines of structural-RTL code to describe it."
An extensive recruitment campaign, mostly for new college graduates, netted approximately 40 new hires in 1997 and another 20 in 1998. "The investment repaid itself as the team matured into a highly effective bug-finding machine that was responsible for finding almost 5,000 of the 7,855 total logic bugs that were filed prior to tapeout," said Bentley.
Of the 100 high-level logic bugs ferreted out by formal verification, two were classic floating-point data space problems:
The FADD instruction had a bug where, for a specific combination of source operands, the 72-bit floating-point adder was setting the carry-out bit to 1 when there was no actual carry out; In the FMUL instruction, when the rounding mode was set to "round up," the sticky bit was not set correctly for certain combinations of source operand mantissa values.
"Either of these bugs could easily have gone undetected not just in the pre-silicon environment but in post-silicon testing also, and we would have faced the prospect of a recall similar to the Pentium processor's FDIV problem in 1994," said Bentley.
Comparing the bugs found by pre-silicon validation of the Pentium 4 with those found in the equivalent stage of the Pentium Pro development, the team recorded a 350 percent increase in the number filed against structural RTL prior to tapeout. The team did a statistical study to determine how bugs were introduced into the Pentium 4 processor design.
Meanwhile, Motorola's Albin told the DAC audience of some winning verification strategies used to harness design team forces dispersed at global design centers. "We know of only two basic techniques for dealing with complexity: divide-and-conquer and abstraction," he said. "Applying both simultaneously is the only way we are going to overcome the verification crisis."
For a system-on-chip to reach market in a reasonable time, Albin said that a working assumption must be made that the individual block designs within the device have been separately verified at the unit level. Based on this divide-and-conquer assumption, the task of integration verification is limited to checking the basic interconnect of the blocks and looking for unexpected interactions between blocks. That requires specific verification skill sets.
"We have found that it is important to have a core verification team who have a fundamentally different skill set than that of designers," Albin said. Among the talents he cited is the ability to shift levels of abstraction; a mix of hardware and software experience; the ability to organize large amounts of data; good or excellent communication skills; and internal motivation.
"Our most successful verification engineers are those who find personal satisfaction in improving the way things are done," said Albin. Verification engineers are compulsively honest, he added, so the question "are we done?" sometimes gets them in trouble.
Abstraction can be used much more extensively than just simulation reference models, according to Albin. Since CAD tools are available that make it possible to routinely and reliably check the translation of RTL down to mask data, the verification effort is usually devoted to showing that RTL implements a top-level specification. However, Albin said that a series of layers above RTL that are specific abstractions need to be applied to bridge the gap between implementation and specification.
"While formal verification of each of these abstractions has already been demonstrated, mostly as research projects, these techniques are ready to be put into practice," said Albin.
Albin said that Motorola uses several general criteria for designs to tapeout:
40 billion random cycles without finding a bug,
directed tests in verification plan are completed,
source and/or functional coverage goals are met,
diminishing bug rate is observed,
a certain date on the calendar is reached.
"Until we can formally verify the design from the specification down we won't have a completely satisfactory answer," he concluded.
As the DAC session turned to training, Ozguner of Ohio State reported that the university's EE department this spring began offering a course in hardware to 40 senior and graduate-level students in cooperation with IBM Corp. "New verification engineers in industry are faced with a large amount of learning that could have been started at the university," said Ozguner.
The long-term business goal is to seed academia for education and research focused on logic verification.
"Functional Verification of Hardware Designs," developed by the department in collaboration with IBM, is a three-credit course with a weekly two-hour lecture and an open lab component. The IBM Verification Laboratory houses 16 IBM pSeries workstations that are used for course projects; the instructors have identical machines in their offices.
"The benefits of this collaboration include the opportunity for study in a verification environment similar to that used by actual verification teams in industry," Ozguner said. Designs are tuned "for the students to use for practicing verification techniques," he said, and the course enables "interaction with verification engineers and managers through guest lectures and verification reviews."
Swatting at bugs
Of the bugs swept from the Intel Pentium 4 processor design, more than 75 percent fell into these categories:
- Goof (12.7 percent). These were things like typos and cut-and-paste errors.
- Miscommunication (11.4 percent). These fall into several categories. Examples include architects' not communicating their expectations clearly to designers and misunderstandings between microcode writers and designers.
- Microarchitecture (9.3 percent). These involved the definition.
- Logic/microcode changes (9.3 percent). These were cases in which the design was changed and the designer did not take into account all the places that would be affected.
- Corner cases (8 percent). These were specific cases that the designer failed to implement correctly.
- Powerdown issues (5.7 percent). These were mostly related to clock gating.
- Documentation (4.4 percent). These problems occurred when something (an algorithm, say) was not documented properly.
- Complexity (3.9 percent). These were bugs whose cause was specifically identified as being due to microarchitectural complexity.
- Random initialization (3.4 percent). These were mostly bugs caused by a state not being properly cleared or initialized at reset.
- Late definition (2.8 percent). Certain features were not defined until late in the project or were not fully thought out.
- Incorrect RTL assertions (2.8 percent). These refer to assertions that were wrong, overzealous or had been working correctly but were broken by a design change.
- Design mistake (2.6 percent). In these cases, the designer misunderstood what he or she was supposed to implement-normally a result of not fully reading the specification.