|
System DesignVerifying a Million-Gate ProcessorSimulating a new UltraSPARC microprocessor requires good techniques and a lot of computing power.by James Gateley
The combination of deep-submicron complexity and increasing microprocessor performance is making silicon design and verification ever more difficult. Exhaustive verification and iteration throughout the design cycle are imperative or the silicon won't function properly when it is installed in the real-world environment. To design UltraSPARC microprocessors, Sun Microsystems faces the challenges of designing, simulating, and emulating high-performance, 64-bit SPARC devices that are 100 percent compatible with existing SPARC binaries. The verification effort demands a high-performance simulator running in a server ranch with 1,000 SPARC processors and an emulation environment of well over 1 million gates. For our event-driven simulation, we use the Viewlogic Chronologic VCS simulator because of its small memory requirement and its overall performance. Emulation tasks are accomplished with Quickturn Design Systems' hardware and software. Historically, the performance of high-end microprocessors has doubled every 16 months. To keep up with this growth, we need to meet a 1 percent performance gain per week. Therefore, if a proposed new design feature might achieve n percent performance gain, it must take less than n weeks to specify, model, design, debug, and verify. Meeting this growth rate is complicated by the 80/20 rule, which says that finding the last 20 percent of the bugs in any design consumes 80 percent of the verification time and resources. These last bugs are not only the hardest to find, but they're potentially the most dangerous and costly. The sheer complexity of an UltraSPARC microprocessor demands extensive and efficient verification techniques. For an UltraSPARC microprocessor to meet its design specification, we have to ensure that the design is, first and foremost, free of bugs. The device also must be compatible with the 32-bit SPARC version 8 and 64-bit SPARC version 9 specification, as defined by SPARC International. In addition, the UltraSPARC instruction set extensions, such as the Visual Instruction Set (VIS) instructions, must be correctly implemented. Finally, we have to ensure that first silicon boots the multi-user Solaris operating system and correctly runs the Open Windows/Common Desktop Environment (CDE) and all other applications. To accomplish those goals, we use our "construction by correction" methodology, which is an iterative process of integration and simulation (see Figure 1). This looping process lets us perform synthesis on the control blocks, compile the datapaths, and extract the custom block netlists to achieve periodic full-chip integrations. As the project progresses there are fewer required changes, an indication that the design quality is improving, and as a result the integration cycles accelerate. This constant process of refinement helps manage chip timing, die size, and overall functionality of the design. The lock-step technique To verify the design, we have a simple strategy: A reference model, written in C++, is simulated alongside of the register-transfer-level design. On an instruction-by-instruction basis, the reference model looks at the RTL in simulation to determine whether both models agree with the resulting state of the processor after each instruction. The reference model and the RTL model communicate through a procedural language interface (PLI), and both are controlled by VCS. In the block diagram of our simulation environment (see Figure 2), the UltraSPARC I is the unit under test. It's surrounded by the system behavioral model, an SRAM model, and a system memory model. Our diagnostic code is written in SPARC assembly language and is loaded either into the memory model or, in some cases, directly into the instruction cache. The system also includes the C++ reference model that is connected to the UltraSPARC I. This is the model that works in lock-step with the RTL model.
Figure 1. The construction-by-correction methodology iterates many smaller blocks until each block produces optimal results. Then the blocks are assembled into the final IC.From RTL to mixed-mode simulation Early in the project, RTL simulation is used exclusively. As the project progresses and we begin to see both synthesized gates and gates extracted from other sources, we begin to simulate with mixed RTL and gate-level models. VCS easily handles this mixed RTL and gate-level model, allowing a smooth transition from the all-RTL model to an all-gate model. Ultimately, this process leads to simulation purely at the gate level. These final simulations are based on gate-level netlists extracted from the layout because that's what is going to final tape-out. Gate-level simulation finds different classes of problems than RTL simulation. When we run RTL simulations, we're looking for functional bugs in a design. At the gate level, we're looking for places where the design has varied from the RTL model. For example, there are places where the synthesis tool has been given an incomplete or improper specification. As a result, the tool is forced to make a creative guess about how to implement the circuit. Mistakes are found in custom blocks, and errors also are injected during clock buffer insertion, scan chain stitching, or hand edits in the design. Attention must be paid Throughout all processing, we must pay careful attention to the syntax and semantics of the RTL design code. When faced with design code that is slightly ambiguous or incomplete, the tools must make some guesses about what the designer really intended. Should different tools make different guesses about design intent, unexpected and inconsistent results will occur. Many tools will give warnings or errors during the translation process. The key is to pay attention to all warnings or errors from every single tool, track down the source of all problems, and resolve them. The most dangerous kind of errors occur later in the project when we stop using synthesis and implement changes manually. Manual implementation raises the possibility that the RTL representation of the design is different from the gate-level representation. Consequently, we must ensure not only that all of the changes implemented manually are functionally correct, but also that all of the views of the design--the gate view, the timing view, the emulation view, and so on--are functionally equivalent to the RTL model and its C++ companion. We use two techniques to verify RTL and gate equivalence: gate-level simulations and formal equivalency checking. Because gate-level simulations are slower than RTL simulations, it's desirable to minimize those cycles. As the project progresses, several different gate-level views of the design are created (such as the emulation view, the timing view, and the layout view), and each needs to be verified, traditionally by gate-level regression simulations. Formal equivalency checking can be employed to show that various representations of the design are functionally equivalent. Equivalence checkers perform equivalency analysis without using simulation, thus freeing simulation cycles for more RTL testing.
Figure 2. The SPARC microprocessor simulation environment has multiple models that must agree on simulation outputs, including running actual code.The design is verified at all possible levels of hierarchy, including the full-chip level. Individual design blocks are combined into various levels of hierarchy, and on each level a testbench environment is employed where we can apply a stimulus to that individual block or hierarchy. The results of the stimulus are sampled to determine whether a block is meeting its functional specifications. Creating the stimuli and expected results for a testbench is a challenge. With hundreds of design blocks and many layers of hierarchy, generating testbenches to surround the blocks for simulation escalates the challenge considerably. It's imperative, therefore, to have a robust tool set to handle test bench generation. Of course, a detailed test plan for each test bench is required as well. At the full-chip level, we have two classes of simulation regressions--miniregression and full regression. The miniregression suite is automatically run anytime anyone on the design team makes any change to the design database. Initiated by the design database check-in script, this process is completely automatic. It's a collection of 25 to 30 diagnostics, about 200,000 or 300,000 cycles in length, that takes about two hours to execute and gives a quick go or no go on every change made in the design. It automatically determines when new problems have been injected into the design and sends e-mail to the culprit about the new problem. Thus the suite keeps us from progressing with a design that's built on some potentially dangerous faults. About two or three times a week, we run full regression simulations, which are typically about 100 million cycles. To accomplish this quickly and efficiently, we share a server ranch with about 1,000 CPUs, all managed by an internal tool called DREAM (Dynamic Resource Allocation Manager), which lets us submit thousands of simulation jobs and get very fast turnaround. To handle the large number of simulation cycles required to fully verify our design at both the RTL and gate levels, we need a compiled Verilog event-based simulator with native code generation to achieve a large number of cycles per wall clock second. Chronologic's VCS is used for all miniregressions and full regressions. Much of our debugging is accomplished by dump file analysis using a waveform viewer/editor. In addition to the directed diagnostic simulations performed in our regression suite, we run a portfolio of tools that apply both random and pseudorandom test stimuli to the design in simulation. With some tools, we inject random events (interrupts or error conditions, for example). With other tools, pseudorandom instructions or transactions are applied to the design with an expected outcome. These tools consume vast amounts of simulation cycles but tend to find more interesting corner cases that could be missed in the directed diagnostic suite. These tools are of greatest value as the design matures and the nature of the problems remaining are more subtle. The bulletproof test suite The test suite is composed of several different techniques and tools. As the project progresses, different techniques are of the greatest value during different phases of the project (see Figure 3). Testing of the design continues beyond design tape-out for fabrication and concludes before first customer shipment (FCS). The big question asked throughout all of the iterative simulation process is whether the test suite is complete. Extensive and detailed test plan reviews for each part of the design are only the first step to ensure complete coverage. We review and revise each test plan at least three times during the project. Test plans specify and document a wide variety of testing methodologies to be applied to the design, including all required design monitors and checkers, directed diagnostics, and pseudorandom testing techniques.
Figure 3. The testing and verification of the full chip calls for different techniques for the different phases of the design cycle.Once the test plan has been implemented and is in simulation, we apply various measurement tools to pinpoint holes remaining in the test coverage. The coverage is measured by Verilog line tools, finite-state-machine arc tools, and gate-level nodal toggle count tools. We also use formal analysis to verify portions of the design, such as key finite state machines, and to ensure multiplexer exclusivity. All of these tools result in quantitative analysis of our regression suite coverage. As the project progresses, we load a gate-level design of the microprocessor into a hardware emulator that is plugged into a real hardware target system. This lab setup allows us to execute real-world program code, such as the Solaris operating system, on our gate-level design. Emulation finds problems that are unrealistic to find in simulation because of the billions or even trillions of instructions often needed to expose a problem. This is our final functional verification step before tape-out to fabrication. Some designers have had the sad experience of getting a chip back from the fab that's passed every diagnostic imaginable, but when it meets the real world, it fails. At Sun, combining a high-performance simulation environment and in-circuit emulation prevents that from happening. Looking to the future If there's one phrase that can best define what's needed for verification in 1997 and beyond, it's "cycles, cycles, and more cycles." Unless we take the time to examine our design from all angles and at as many levels as possible, a killer bug will be lurking somewhere in the design. For the UltraSPARC I, we ran over 5 billion instruction simulation cycles before tape-out. Of these, over 1 billion were random stimuli. This extensive simulation is necessary because in a real-world test of our design, one multi-user boot of the operating system is estimated to be more than 5 billion instructions long. For current projects, we're simulating more than 200 million cycles per week. Though those numbers are staggering, we believe that by the year 2000 we'll see a tenfold increase in simulation demands over those used for the UltraSPARC I. The conclusion is simple: In simulation, there is no plateau, only an escalation that will require countless cycles, more powerful hardware, and the best simulators available. James Gateley is the design verification manager at Sun Microsystems Inc. (Mountain View, Calif.)
To voice an opinion on this or any Integrated System Design article, please e-mail your message to miker@isdmag.com. integrated system design October 1997[ Articles from Integrated System Design Magazine ] [ ICs and uPs ] [ Custom ICs and Programmable Logic ] [ Vendor Guide ] [ Design and Development Tools ] [ Home ] For more information about isdmag.com e-mail cam@isdmag.com For advertising information e-mail amstjohn@mfi.com Comments on our editorial are welcome Copyright © 1997 Integrated System Design Magazine
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints| RSS|
Digital| Mobile |
| Network Websites |
|
International |
|
Network Features |
|
|
|
All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved. Privacy Statement | Terms of Service | About |