|
System DesignVerifying the PA-8000's FPUComplex high-speed logic requires different methods for verification.by David Smentek, Glenn Colon-Bonet, and Craig Heikes
This article by David Smentek, Glenn Colon-Bonet, and Craig Heikes is the third in a series of three that describe the design of Hewlett-Packard's PA-8000 microprocessor. The first article appeared in our January 1997 issue, and the second appeared in February's issue. Each of the three articles were extracted from presentations that were given at Design SuperCon97 (Santa Clara, CA). In designing the PA-8000 microprocessor, our team at Hewlett-Packard Co. (Fort Collins, CO) set out to develop a verification strategy for the floating-point unit that ensured the design was functionally correct at first silicon. This goal was greatly ambitious, because a floating-point implementation must strictly adhere to a standard defined by the IEEE. At the same time, the large number of operand combinations, especially for a three-operand multiply-add instruction, make exhaustive testing for functional correctness infeasible. Several strategic aspects were important in achieving our goal. Our first priority was to develop a single reference model that we could trust. Second, the use of structured random vectors was critical. Because of the size of the input vector space, exhaustive verification was not a feasible option, and pure random verification does not adequately test a floating-point implementation. Finally, all simulation stimuli were in a single source vector format. Formatting for a given simulator was done automatically by a proprietary tool, enabling us to easily port and archive critical vectors. The PA-8000 was our first CPU chip to use cycle-based simulation to verify the floating-point unit. Cycle-based simulation allowed pre-silicon verification that exceeded the performance of previous floating-point projects by orders of magnitude. This strategy also enabled high-speed, high-volume vector generation for our switch-level simulator. Other verification techniques included the development of a C-language reference model for IEEE validation, a behavioral (HDL) model of the unit, and a transistor-level netlist of the final design. Each of these models was compared against the reference model using a custom random vector generator that was specifically developed for floating-point verification. A difficult task Floating-point verification is difficult because operations must conform to an established standard that defines specific operations, number formats, and behaviors. The difficulty of verifying that a design meets the standard is compounded by the large number of possible input combinations. In addition, the IEEE-defined rounding modes, precisions, and special-case operands--such as infinity, denormal numbers, and NaNs ("not a number")--further complicate verification. The size of the input space makes exhaustive verification impossible, and pseudo-random approaches do not adequately cover the corner cases. For the PA-8000 floating-point unit, full-custom design was necessary to achieve our performance goals. Not basing the design on validated standard cell libraries further complicated the verification problem; our strategy needed to guarantee schematic correctness. An FMAC operator has several advantages over a more conventional floating-point design. Because it performs only one rounding, it is more accurate than separate multiply and add operations. It also needs fewer register file ports and has a lower latency.
Figure 1. Pure random number generation evenly distributes the inputs over the plot, whereas structured random number generation clusters them along several lines.An FMAC operator, however, is much harder to verify. By simply adding another 64-bit operand bus to the unit, we have exploded the number of input combinations from 10 38 to 10 57 . Also, because the FMAC operator carries twice the internal precision, a large number of internal bits are not visible at the output. Since our fastest simulators were capable of running only about 5,000 vectors per second, we could test only a small subset of the total input space. Choosing that subset was therefore extremely critical to correctly verify our design. To adequately verify the unit's internal states, we needed to assert as many signals as possible between the behavioral and circuit-level models. We used the behavioral model to generate over 2,400 assertions of internal state per vector. These were run on a switch-level simulation of the schematics. The large number of assertions allowed us to track down logic errors quickly. The reference model Since the reference model was used throughout the design cycle as the absolute correct standard, it had to be totally free of floating-point code. This integer model allowed us to run verification on actual silicon without the concern of a functionally incorrect floating-point unit tainting the answers. In writing the model, we were able to leverage the operating system's floating-point trap handlers. The trap handlers covered all operation codes except the MAC operations, which were new for the PA-8000. A technique we used to improve the model's robustness was to separate our engineers developing the reference model from those developing the floating-point architecture. This infrastructure decreased the possibility of an identical error appearing in both the reference model and the architectural implementation. A combination of random and hand-written verification validated the reference model. We used an existing multiply-accumulate implementation, the IBM RS/6000, as a reference for the MAC operations. We also used existing HP floating-point implementations as references for all other operations, including HP-PA specific operations and exception flags. The reference model was compared with the existing FPUs using over 100 billion structured random vectors, which were generated by a proprietary program. Also, to ensure that no corner cases were missed, we generated and ran vectors, testing every exceptional input combination. Structured random numbers As noted earlier, pure random verification is unlikely to find all the bugs in a given implementation. In general, there is almost no chance that purely random input vectors will exercise any interesting logic in a design. Two random operands in a floating-point addition have less than a 3 percent chance of their mantissas overlapping at all, let alone exciting a rare functional bug. Our strategy was to create a random number generator that targeted operands likely to excite corner cases. The random number generator was purposely skewed to choose sets of inputs whose exponents were close together. This strategy greatly increased the probability that the resulting operation could produce cancellation, or at least have a significant overlap in mantissas. We generated long strings of 1s and 0s in the mantissa to excite carry chains and rounding logic. Also, there is a finite probability of numbers near or equal to the minimum and maximum representable numbers, both signs of zero, NaNs, and infinities.
Figure 2. Stimuli, in a common vector format, generated answers on the reference model. The resultant vectors were then used for three different levels of simulation: behavioral, cycle-based, and switch-level.Figure 1 compares pure numbers with structured random numbers using just two operands, unlike the three required in a MAC operation. The two graphs have the same number of input points, however, they are concentrated very differently. In the pure random case, input combinations are evenly distributed across the input space. In the structured random case, there are large concentrations around x = y and x = -y, as well as where one or both of the operands is close to zero. Common vector format In addition to structured random vectors and hand-coded critical vectors, we used industry-standard regression suites in a single format. The common vector syntax is based on the format used by Jerome Coonen in specifying verification suites for checking IEEE compliance (Coonen did graduate work at the University of California at Berkeley in support of the definition of the IEEE floating-point standard). A single line fully specifies a floating-point operation, and we expanded the format to support three-operand instructions and added fields for flags specific to our implementation. All simulation environments use vectors in this format as the input and produce them as the output in error reports. Standardizing on this format provided a number of advantages, some not immediately obvious. Since the format is in plain text, we could quickly see what type of operation was represented. Also, since all vector sources were in this format, the distinction between source and derived data was more easily defined. A common way of representing operations facilitated communication between different simulation and tool environments. In addition, critical vectors can be saved and communicated across projects easily. Figure 2 is a high-level view of the logic verification models. Three types of stimuli in the common vector format were generated. The correct answers were generated on the reference model. These vectors were used as the stimulus for three distinct simulation models. To verify the architecture, we used both the behavioral and cycle-based simulators, which offered a tradeoff between observability and simulation speed. In addition, the switch-level simulator was used to verify that the schematics performed the proper logic functions.
Figure 3. The cn2aw (Coonen to AWSIM) program accepts vectors in the common vector format (CVF) as the input, simulates them on the cycle-based model, and queries internal nodes in the model. Then it combines the assertions with timing information, pipelines them, and writes them out in a format suitable for AWSIM.We developed the high-level architectural model using Verilog-XL from Cadence Design Systems Inc. (San Jose, CA). The model did not attempt to emulate the low-level circuit details, and although the physical implementation used a dual-rail dynamic circuit design, the Verilog model was completely single-ended. Also, in the interest of simplicity, the model was stateless, avoiding the complexity of modeling the design's latching structures and pipeline stages. This high-level approach was easier to debug and support than a detailed RTL model and also allowed us to make large architectural changes early in the project quickly and easily. As the circuit design progressed, the top level of the schematic design hierarchy had to strictly match the behavioral model hierarchy. Every top-level physical signal was analogous to one in the architectural model. The designers were responsible for keeping the Verilog model up to date with the latest circuit changes. Cycle-based simulation A random verification strategy is limited by the total number of vectors that can be run on the model. A major concern was the throughput of the behavioral model, which was limited to approximately seven vectors per second. This speed was sufficient for flushing out the common bugs, but a higher-throughput solution was needed to find less common ones.
A cycle-based simulator developed internally met this need. Since the order of logic evaluation in a cycle-based model is determined statistically at compile time, the simulation avoids the overhead of event-driven updates. The cycle-based architectural model was generated from the Verilog behavioral model. Several steps were involved in creating the model. The first was to invent a library of gate primitives for the preprocessor to use. Then, we used Design Compiler from Synopsys .com/isdweb&If=isd-sendtolog"> Synopsys Inc. (Mountain View, CA) on a block-by-block basis on the behavioral code to synthesize the model into gates from the library. A number of factors simplified the synthesis of the MAC model. As long as functional correctness was preserved, it was not essential to monitor gate delays, fan-outs, critical paths, or other common concerns in logic synthesis. Additionally, because the circuit speed of a gate was of no concern, the library for the model generation permitted high fan-in and very complex gates. The synthesized design, consisting of tens of thousands of logic gates, was passed through a proprietary preprocessor to generate C code. The C code was linked with the structured random number generator and the C reference model to create a stand-alone executable that served as our high-performance verification model. Once constructed, this model was capable of simulating at a rate of over 3,800 floating-point vectors per second on an HP9000/735-class workstation. Although a considerable amount of work was required to generate the first cycle-based model, the results justified the effort. In the first week of operation, five new bugs were found and fixed. The last bug we discovered was a corner case that was a one-in-1.5 billion failure. Relying purely on Verilog simulation, it is unlikely we would have caught this bug before tape release. Verifying the schematics As transistor-level schematics were completed, we had to ensure that a simulation model built from the transistors behaved consistently with the behavioral and cycle-based models. We used an HP internal switch-level simulator called AWSIM to simulate the transistor-level model. The AWSIM simulator treats each transistor as a switch in series with a resistor controlled by the logic level present on the transistor's gate terminal. This type of simulation runs 35 times slower than the event-driven behavioral simulator and 19,000 times slower than the cycle-based simulator. We developed a program to generate the vectors for the switch-level simulator. Called cn2aw (Coonen to AWSIM), the program creates assertions based on the behavior of the cycle-based model (see Figure 3). It accepts vectors in the common vector format (CVF) as the input, simulates them on the cycle-based model, and queries internal nodes in the model. Using this technique, we generated values for approximately 2,400 internal nodes for each vector. Using timing information contained in a separate file, the assertions were pipelined and written out in a format that could be read by AWSIM. We checked more than 1 million floating-point vectors on the final version of the schematics. Because switch-level simulation is so slow, we used a large number of workstations to provide enough cycles for verification. Throughout the project, anywhere from 10 to 15 workstations were running jobs nightly to verify the functionality of the transistor-level netlist. Automated scripts launched the jobs and checked the logs for failures, automatically e-mailing project members if a problem was found. The scripts also tracked interesting statistics such as vectors run per machine and total assertions performed. Our verification strategy for the floating-point unit was different from the one used for the rest of the processor. Floating-point is, in essence, a very hard combinational logic problem. Most other verification was concerned with problems caused by synchronous interactions. As a result, integration into a common simulation required some extra work because of the divergent strategies. It should also be noted that the lack of a structural, state-based RTL model caused some difficulties in verifying the architecture against the floating-point schematics. We probably will not abandon a stateless model for architectural verification. For more accurate circuit representation, we will likely create a lower-level RTL model, as well. Such a model also would facilitate vector generation for the AWSIM simulator. Overall, however, the verification strategy was a great success. David Smentek and Glenn Colon-Bonet are design engineers at Hewlett-Packard (Fort Collins, CO). Craig Heikes is a project manager at HP.
To voice an opinion on this or any Integrated System Design article, please e-mail your message to michael@asic.com. integrated system design March 1997[ Articles from Integrated System Design Magazine ] [ ICs and uPs ] [ Custom ICs and Programmable Logic ] [ Vendor Guide ] [ Design and Development Tools ] [ Home ] For more information about isdmag.com e-mail cam@isdmag.com For advertising information e-mail amstjohn@mfi.com Comments on our editorial are welcome Copyright © 1997 Integrated System Design Magazine
|
||||||||||||||||
Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints| RSS|
Digital| Mobile |
| Network Websites |
|
International |
|
Network Features |
|
|
|
All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved. Privacy Statement | Terms of Service | About |