Vincent Ratford, vice president of marketing and business development at Virage Logic (Fremont, Calif.), knows about providing design solutions for IC companies. Earlier, he was director of marketing for Mentor's System Design Division. Ratford holds a BSEE from Northeastern.
The growth in demand for system-on-chip devices has spurred a flood of better, faster, smarter approaches to design and development. SoCs are also moving from logic-dominant chips to memory-dominant chips, bringing a whole new angle to SoC test. The addition of memory, while it creates a more-powerful chip that adapts better to today's memory-hungry applications, brings with it the curse of larger die size and poor yields. And with an average of nearly 40 percent of SoCs going straight to the dumpster, yield management is one area of chip design that's ripe for improvement.
There are several different ways to manage memory yield. The traditional approach has been to use external test and repair equipment, which can represent as much as 40 percent of the total manufacturing cost. A new technology, a self-repair algorithm for SoCs that tests and repairs embedded SRAM devices right on the chip, is the newest methodology. This new technology can improve typical SoC yields by as much as 82 percent over traditional external test and repair systems.
One of the primary causes of chip failure is the growth of memory as a percentage of the chip's area. Increasing the memory on an SoC adds layers, complicates the manufacturing processes and increases cell density. In fact, because of their high cell density, embedded memories are more prone to defects that already exist in silicon than any other component on the chip. This trend will continue to grow, according to the Semiconductor Industry
Association and International Technology Roadmap for Semiconductors 2000. By today's estimates, within the next year, memories will comprise more than 50 percent of a typical SoC. And by 2010, memories will cover 90 percent of the SoC die area.
In addition, the quantity of embedded instances per SoC is growing. This is due to smaller process geometries, such as 0.13 micron, that allow more and more functionality to be included on the chip. As various functions such as video recorders, LAN controllers and IEEE 1394 interfaces are added to the design, memory instances are more logically located adjacent to the various functions. Some of the latest designs include more than 75 embedded memory instances.
As the percentage of embedded memory continues to increase, so does the chip's complexity, density and speed and, of course, the probability of failures due to wafer defects. For SoCs to keep up their momentum and remain a viable option for improving system integration and performance, the problems relating to high-density, multimegabit memory yield must be solved. There are several opportunities for reducing this, each with its set of trade-offs.
Including extra rows and columns that can be swapped for defective elements-known as adding redundancy to the chip-can, in certain instances, help raise memory yield substantially.
One problem with certain redundancy models is that as the size and complexity of the SoC grows, adding extra rows and columns will become more and more burdensome, adding to the cost and intricacy of the chips. And as memory becomes a larger percentage of the chip, redundancy become less effective, substantially eroding typical yield gains.
In addition to the problems that extra rows and columns involve, redundancy improvements also require a costly investment in external test equipment. Engineers need extra training and support to find and fix defects, not to mention the slowdown in time-to-market. As memory continues to grow as a percentage of the SoC's functionality, redundancy begins to lose its appeal as a viable solution due to the mounting cost for equipment, the extra logic and the increasing time lag and engineering hours dedicated to test.
One of the most costly elements of manufacturing is a direct result of the expense of the testing process. External test and repair solutions can cost from $3 million to $7 million for the necessary wafer test and repair equipment and processes. There are typically four device insertions required when testing SoC memory:
1. The memory tester tests the memory on the SoC die. It imports the results and then, using an expensive software add-on, performs the redundancy analysis and allocation.
2. The information on how to allocate the spare elements for defects is fed to the laser repair equipment. It blows the fuses that enable the spare elements to be swapped for the defective cells.
3. The tester retests the memory to ensure that the repairs were made properly.
4. The logic tester analyzes the remaining nonmemory components of the SoC.
When testing stand-alone memories, the external test systems make the assumption that the memories are readily accessible directly from the I/O pins of the chip. Because SoC design is significantly more complex, in an SoC test situation the designer must carefully route each embedded memory to enable pin access, a process that can be expensive, as well as time-consuming.
There are, however, instances when external testing and repair is on par with the on-chip version. External repair can still be cost-effective if the number of blocks to be tested is one large memory block and the memory pins are easy to route to the I/Os. But considering the trend for memory to take on more and more of an SoC's real estate, the design time required to route the memories and establish the pin connections is growing at an unacceptable rate. Meanwhile, the die sizes are increasing from the extra routing, and chip packages are enlarging to accommodate the extra pads for memory access. Even with all the advances in test today, external memory testers remain unable to test at the speed of the chip, which is an essential element in complex chips for finding path delays and timing faults.
Built-in self-test (BIST) has been called the future of test, the technology that will save SoC (and FPGA and ASIC) manufacturers from the ruin of inferior yields.
Software designers are well aware of the crucial issues facing the future of SoCs around design-for-test (DFT) solutions. Many conference discussions have addressed this trend, while several companies are working on BIST solutions or partnering with chip makers for the testing of multiclock logic circuits, phase-locked loops and other FPGA, ASIC and SoC elements. But these solutions come with a cost, an investment that typically runs in the tens of thousands of dollars. And the solutions stop with test.
Beyond BIST to repair
A new technology is now available that not only contains built-in test and diagnostics functions but also, in fact, has the ability to repair bad memory bits right on the chip. While this technology is currently only available for SRAM, it points to a new way of providing substantial cost savings that will be applied to different types of memories in the future. Incorporating repair right on the chip results in more memory, lower manufacturing and repair costs, shorter manufacturing test time and improved wafer yield.
Using a typical example is an excellent way to illustrate the economic power of the new self-test and repair technology. Take a company building an xDSL modem chip in a 0.18-micron process incorporating 5 Mbits of SRAM on an 8 x 8-mm die with 1 million units in the first year. Assume an average selling price of $25 per unit and a per-unit wafer cost of $2,200. The wafer defect density is projected at 0.4 for memory and 0.3 for logic.
Without the built-in self-test and repair, die yield would be approximately 64 percent compared with the 82 percent yield achieved with the new technology. Using built-in test and repair rather than traditional external tools, made redundant by the new internal method, saves $500,000 in test and repair cost. The savings realized by increased yield alone can create an additional $2.4 million in savings. In this project, estimated at $25 million, built-in test and repair can save up to 12 percent, or $3 million, of the profits from getting swept up on the foundry floor.
How it works
As implemented in this particular product, there are three components that provide the built-in test and repair function: SRAM memories, a test and repair processor and a fuse box. The three operate in unison to deliver high yields by repairing defective chips at a very high rate. The system evaluates how much redundancy is needed and how to partition it throughout each unique memory instance. The processor understands the redundancy scheme, the failure history of the process being used and how the unit has failed. The fuse box architecture minimizes the area by efficiently storing repair information.
The test and repair processor uses an on-chip redundancy algorithm for the repair function. The processor determines how much redundancy is needed and how to partition it throughout each unique memory instance. It accesses a redundancy scheme, the failure history of the process being used and how the unit has failed. The processor tests the memory instances at-speed, a critical distinction between the new technology and many existing tools. After testing and repairing the memory instances, the processor turns the memory operation over to the normal address, data and control buses on the SoC.
In the next steps, the test and repair processor creates a repair data signature, which is sent to an external tester where the laser repair equipment programs the fuse box with the correct repair information. Once this is completed, the fuse box maintains the memory repair signature permanently.
One issue that can impact the repair results is the laser repair itself. Laser repair can introduce new failures. However, the number of new laser-induced failures is typically negligible and greatly outweighed by the newly available memory function.
This new technology makes expensive, external memory testers obsolete for SoCs. Because the memory is tested and repaired by the processor, the only other SoC test that needs to be performed externally is by the logic tester. This can be coordinated with the memory test and repair for a faster, more thorough and ultimately more cost-effective test.
The test and repair processor performs in a series of four stages.
The first function is to run a standard BIST for the test phase. The BIST function is important since it determines how much memory needs to be repaired. This BIST technology can find more than 99 percent of all memory defects in an SRAM, providing a solid start to the process.
Then, a built-in self-diagnosis finds the location of any defects and, if needed, provides error logging and scan-out failure date data.
Third, the built-in redundancy allocation module takes over at this point. This function identifies available redundant rows and columns and maps the optimum redundancy assignment. It pulls information from its process failure history, which is a database of failure data from the specific foundry that contains pertinent defect information.
Finally, the reconfiguration data module translates redundancy allocation into a memory-specific repair signature before programming it into the fuse box.
Of all the processor functions, diagnostics is the most time-consuming. If there is an excess amount of information to be analyzed, the diagnostic phase of the test can be considerable. The new technology is capable of testing and repairing several memory instances at the same time. For example, using two processors, one can be used on four 1-Mbit memory instances, and another with a single 1-Mbit memory instance.
Placement of the memory instance and processor on the chip are dependent on such factors as chip area budget, power, speed, system clock, buses, design hierarchy and the chip's floor plan. A single fuse box can serve all memory instances on the chip or each can have its own fuse box.
The intelligent wrapper (IW) associated with each memory instance is used in conjunction with the self-test and repair processor to perform test and repair of the memory and to allow normal memory functioning in the system. The IW contains functions such as address counters, registers, data comparators and multiplexers. The IW is placed close to the memory core to allow at-speed testing.
A test and repair processor, at approximately 5k to 7k gates, requires more silicon compared with about 500 gates for a typical BIST module (depending on the memory configuration). But the die area is inconsequential considering the size of the memories, the yield savings and the fact that there are no additional fuses.
What if the product passes the foundry test but fails in the field? In addition to the factory test, this new technology has a built-in field repair option providing instant test and repair at any time. With field repair, a built-in processor tests and repairs the memory each time the product is powered up or reset. This automatic field function finds defective memory locations (if any), allocates redundancy resources and produces a repair signature that resides in a location on the processor memory. This volatile repair signature is applied to the redundancy control logic and remains there as long as power is applied.
To keep memory-heavy SoCs as a viable option for current and next-generation systems, solving yield issues is imperative. Reducing test and repair cost and improving yield are two relatively easy techniques that can be implemented immediately with the newer technology. As more and more new methods involving test and repair emerge, yield issues will no longer hold back the many achievements that have made memory-intensive SoCs so desirable.