A graphics chip vendor used a DFT methodology and an optimized tool suite to speed the testing of a complex accelerator from design, through tapeout, to manufacturing.
by Mehran
Amerian
Graphics accelerators, among the most complex parts produced in volume, place especially heavy demands on testing. Controlling costs, adhering to high quality standards, and meeting tight time-to-market schedules are paramount in the highly competitive graphics arena. A slip in the design schedule caused by last-minute testing issues can mean the difference between winning and losing a place in hundreds of thousands of systems. At S3, a supplier of graphics and video accelerators in Santa Clara,
Calif., the release of our latest product, the Savage4, was no exception. In order to meet our release goals, we needed to deliver a testable design to production for the chip, which is manufactured in a 0.25-µm process and contains approximately 1.6 million gates. To meet our company's goal for this complex chip, we employed design-for-test (DFT) software from Synopsys, along with some in-house tools to translate the Synopsys output to our tester.
Targeted at the commercial and
consumer PC markets, the chip delivers 3D rendering capabilities equivalent to high-end, niche gaming devices, as well as 2D acceleration. Built around a new 128-bit highly pipelined 3D engine, the chip provides AGP 4X technology, 32-bit 3D rendering, S3TC, trilinear filtered single-pass multi-texturing, hardware-accelerated DVD, and support for 32 Mbytes of memory and digital flat-panel displays.
The tools helped us to solve the thorny test problems that such a complicated chip
presented. The solution, of course, was a DFT methodology that addressed test issues early in the design flow. As a part of the methodology, we took special care to insert test functions into the HDL code, then stitched the scan chains together at the gate level. Equally important was the automation of testing for path delay faults and the identification of critical paths to enable at-speed testing. Overall, the tools and methodology allowed us to automate many aspects of the testing process, ensuring that our chip
came through tapeout and manufacturing on time and according to specifications.
Inserting test
In its approximately 1.6 million gates, the chip contains an embedded 2D & 3D engine, the AGP interface, and numerous FIFO megacells. Several clock domains use both positive and negative edge flip-flops. We employed a full-scan methodology to ensure that almost all of the flip-flops in the design were scannable. For the remaining state elements, including FIFOs and
several thousand latches contained in legacy cores, we didn't use a full scan methodology. Instead, we made those elements conditionally transparent during automatic test-pattern generation (ATPG) so we could test the surrounding logic.
To ensure that all logic was indeed testable, we started checking it at the register transfer level (RTL). We designed the chip hierarchically, with engineers synthesizing different modules that we then assembled to create a full chip. For each module
of the design, we included--as early as possible--all test functions that could result in changes to the design, to avoid costly surprises before tapeout. This DFT methodology eliminated time-consuming iterations caused by test engineers returning to the design engineers with requests to address testability issues.
To address our testability needs as early as possible, we used Synopsys's DC Expert Plus to synthesize logic to gates. By using this tool on each module or subblock in the
design, we ensured that all flip-flops in the design were scannable. This test synthesis process took modules from HDL directly to optimized, testable gates. We created test-ready HDL by specifying all the necessary DFT requirements prior to the synthesis process. The effort primarily required adequate control of clock and reset signals for scan chain shifting. Optimizing the testable gates meant that the gate-level implementation was fully scannable, while meeting all design constraints and process technology
rules.
Linking the chains
At the RTL stage, the DFT methodology doesn't require that we stitch the chains together, but merely synthesize all necessary logic as scannable logic. The process ensures that the testable design is fully functional and meets all of the necessary test, timing, area, and power constraints.
|
Figure
1 - The testing flow
|
|
|
The testing flow features simultaneous logic and test synthesis, automated scan partitioning and design-rule checking, and automatic test-pattern generation.
|
Once we completed the blocks at the top level, we used Synopsys's Testgen Start to identify which flip-flops belonged to which
clock domain. Because we knew that the RTL was synthesized with test included, we could stitch the scan chains and remain confident that we weren't degrading the design's performance. We then used Start to order the chains hierarchically, stitch the chains together for each clock domain, and insert the necessary lockup latches for clock domain crossings. Once we'd assembled all the chains, we ran a scan design-rule check (SDRC) to make sure that all the chains operated correctly and were connected to the
appropriate test points. This step ensured that the data could shift reliably through the scan chains during test.
Once we received the layout and delay information (DEF and SDF files), we optimized the scan chain order. In this reordering step, we unstitched the chains and restitched them based on physical placement information. The optimization had to occur at that point, because when we stitched the chains together before layout, we were using a logical netlist and didn't know the
detailed physical layout of the design. After layout, we optimized the design to connect scan flip-flops that lay closest to each other, provided they were in the same clocking group. The tool allowed us to reorder the chains in the netlist without introducing mistakes that would impair the design, relieving us of the need to go back to the design engineer to fix any new errors. The optimization step also saved valuable routing channels and made final routing much easier. The routing channels might have come
in handy if engineering change orders had become necessary. Once the chains were restitched, we once again ran SDRC to ensure that all chains were still correctly connected across the design.
We also optimized the design to allow testing as quickly as possible. Automatic test equipment is expensive and manufacturing time is critical when it comes time for device testing. Our goal was less than one second per chip, so the more chains we had, the better. For example, if a block contained a
chain of 2,000 flip-flops, it would take 2,000 scan shift cycles to load data into the chain. By dividing the chain into separate chains of 1,000 flip-flops each, we needed only 1,000 cycles to perform the same test. Since additional serial memory for the tester is expensive and limits the number of available scan chains, we used the standard functional pattern memory of the tester and a large number of parallel scan chains.
After tapeout, we ran the ATPG tool in Testgen to generate the
test vectors and included compaction techniques to minimize the number of scan vectors, saving valuable tester time. Notice that we didn't have to perform ATPG and verify the patterns before tapeout, because our DFT process reliably enables very high coverage. If our coverage wasn't sufficient to meet our 250 DPM goal, we used the interactive features of the tool to help us understand why the remaining faults weren't being tested.
Pathing the test
At that stage,
we had sufficiently covered the stuck-at faults. To include coverage of the path delay faults that accumulate along circuit logic paths and prevent the circuit from functioning at speed, we employed the Path-test module of the tool in conjunction with Primetime, a static timing analysis (STA) tool. Timing-critical paths reported by STA were read into the Pathtest tool, which generated test vectors to exercise the paths. On such a large design, it's very difficult and time-consuming to try to test such
timing paths with manually generated functional vectors. Automating this process saved us a considerable amount of time and made our flow more predictable. We found that the technique significantly augmented the test quality with a minor increase in tester time.
|
Figure 2 - Critical paths
|
|
|
To test critical paths, we used a pair of vectors (V1 and V2) from a single scan pattern, applying two functional clock edges to the chip at a speed cycle time apart. The first clock edge created the transition from V1 to V2, and the second clock edge captured the effect in a flip-flop at the end of the path. If the timing paths were too slow, a mismatch would occur when the captured value was scanned out.
|
This process improved the chip by detecting subtle manufacturing defects and variations that traditional slow stuck-at testing approaches don't cover. Today's deep-submicron processes are becoming increasingly prone to such problems. Ensuring that we had generated sufficient vectors to test critical paths was important for our chip, because the graphics functions in today's systems must keep up with the ever-increasing speed and bandwidth of high-performance microprocessors. In many
cases, we are pushing the performance limits of our silicon processes. Path delay fault testing is therefore necessary to ensure product quality and help us sort the production chips into the rated speed categories during test. Commonly known as speed binning, the technique allows us to maximize yields even when processing variations threaten to intervene.
Selecting paths for at-speed testing was more difficult than it seemed because the number of paths in a circuit can rise exponentially
with the number of gates. By comparison, the number of stuck-at faults tracks linearly with the number of gates. The STA tool produced a manageable list of paths that were most important to test--usually, the paths with the least amount of timing slack. We also needed to organize paths by clock domain, since paths in one clock domain may not need to operate at the same speed as paths in another clock domain. We kept the patterns for each of these clock domains in separate groups so that we could apply them
with different timesets on the tester. Design engineers augmented the output of the STA tool by identifying specific paths that they knew were important to test. If the Pathtest tool proved these paths to be untestable, we added extra logic to the circuit to make them testable.
We provided the tool with path lists and generated vectors that covered our critical paths. These vectors were then formatted and run on the tester to identify which parts actually ran at which speeds. If a part
passed all the critical path tests with sufficient clock timing to qualify it as a 166-MHz part, we could then categorize it at that speed. If a chip failed the tests, we ran it again under relaxed clock timing conditions to categorize it at lower speeds. Of course, we didn't need to run these scan tests at full system speed. To ensure that critical paths met the required timing, we needed only to create a transition at the beginning of the path with one clock edge, and then one cycle time delay later, to
capture the effect at the end of the path with a second clock edge. The process allowed us to isolate and test critical paths at speed, without requiring full at-speed bandwidth from the tester. For these tests, we shifted data in and out of the scan chains at the same slow speed as our stuck-at tests.
To verify the methodology, we selected random parts from each speed category and ran them in the actual system environment to make sure that they did indeed qualify for that speed grade. We
have found a very good correlation between our critical path testing methodology and the actual speed running in the system.
The result of all of this effort was that the Savage4 met its testability requirements for coverage, speed rating, and test time. The DFT techniques have succeeded in over 20 different designs and we will continue to use them. The tools allowed us to develop a streamlined design flow that merged the disciplines of design and test, without a huge effort on the part
of our designers to meet our test requirements.
Mehran Amerian is the design-for-test manager at S3, Inc. (Santa Clara, Calif.). He has established DFT techniques at S3 for chips ranging from 200,000 gates, up to the million-plus-gate design of the Savage4. Prior to S3, Amerian worked at Sun Microsystems in the Test/DFT area. He was a member of the first group of engineers who worked on DFT design for the first generation of the Sparc Microprocessor.
To voice an opinion on this or any other article in Integrated System Design, please e-mail your comments to
jeff@isdmag.com.
Send electronic versions of press releases to
news@isdmag.com
For more information about isdmag.com e-mail
webmaster@isdmag.com
Comments on our editorial are
welcome.
Copyright © 2000
Integrated System Design
Magazine