United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Reliability Management for Deep Submicron ICs

The demands of time-to-market collide with the requirements for long-term reliability studies.

by Riko Radojcic


Deep submicron products are continuing to penetrate new market segments, have a broader spectrum of applications, and are more sensitive to the application-specific design trade-offs. In addition, continuing technology scaling is eroding the design margins so that the balance between reliability, performance, die size, power, etc. for deep submicron products is even more delicate. In this environment, qualification practices for reliability management are essential.

Qualification tests are a part of the transfer procedure for moving a product from engineering to production. Typically, these tests focus on the manufacturability, quality, and reliability of a product. Reliability life tests, often dominant in qualification procedures, are based on empirical demonstration of reliability through industry-standard qualification tests, such as 1,000 to 2,000 hours of operating life at elevated temperature or voltage, or 100 to 500 temperature cycles or shocks, etc.

Although these qualification tests have worked well in the past, they are inadequate for current deep submicron products. In addition, the proprietary solutions that many companies are employing fall short of an effective, long-term standard for qualification practices. However, as you will see, there is a solution.

Inadequacies of the conventional qualification practices The cost of performing a full suite of conventional product-level life tests is approximately $20,000 to $50,000 per design. The costs do not appear to be prohibitive, in spite of the fact that they are rising as a function of increasing complexity, pin counts, and power dissipation. The real costs, which could run into millions of dollars, are associated with product introduction delays due to the time required for qualification life tests. In today's rapid market, the cost of these delays is unacceptable.

Conventional methodologies are difficult to implement for modern products. Therefore, performing physically meaningful accelerated reliability life tests on increasingly complex ICs is a significant challenge. The acceleration factor, required to relate the life test to the in-use conditions, is only credible for one given failure mechanism. In addition, small life test samples drawn from one or two production runs are yielding results that are statistically meaningless and incapable of identifying any problems associated with process variability.

Thus, conventional qualification practices (1) perform the wrong tests, (2) perform the tests incorrectly, (3) are interpreted incorrectly, and (4) are typically statistically insignificant. They also (5) cost too much and (6) take too long to run. Table 1 illustrates this by giving the conventional product qualification life tests a grade between one and ten for various technical and business attributes. On an arbitrary subjective scale, the average overall score of conventional qualification tests is only 3.25 for modern ICs.

The only arena where the conventional product-level life tests score well is in the procedures section. However, product-level life tests are performed all the time merely as a ritualistic formality, and they do not produce meaningful, cost-effective engineering information.

What companies are doing today In an effort to reduce product time-to-market, many companies are re-examining their practices and are evolving their own versions of qualification methods. Because life tests are the longest single delay in product transfer to manufacturing, a trend for simplified qualifications is evolving. The new practices can be classified in three broad categories, as summarized in Table 2.

Table 1. Relative scoring of the conventional qualification methodology for four types of attributes.

Some companies merely qualify the process technology by life testing a RAM or some other standard product vehicle, and they do not perform incremental reliability evaluations (process qualification). ASIC vendors typically qualify their product family--library, silicon process, package etc.--by life testing a representative or a specifically designed vehicle (family qualification). 1 Yet, other companies adhere to the conventional approach (product qualification).

What is needed Clearly, qualification procedures need a standard. Despite their current inadequacies, they are one of the principal milestones in component supplier-customer transactions. The supplier, the customer, and often the customer's customer all have to use the same qualification procedures.

A meaningful qualification strategy should verify the trade-offs between reliability and product performance or cost, which are application specific. Thus, reliability requirement for products intended for consumer market are entirely different than for products intended for extreme environmental conditions. Clearly, qualification procedures should be application dependent. Therefore, looking for some new standard qualification test, similar to the 1,000 hour life test, may not make sense. What is needed is a standard methodology.

There is also a philosophical reason for evolving a new qualification methodology 2 : identification of any reliability risks at that last stage of the product introduction cycle is clearly wrong. Given the current time-to-market pressures, the qualification procedure should be concurrent with the design cycle. This would also ensure that the cost of addressing the risk is minimized (see Figure 1 and Figure 2).

Note also that the industry has evolved many metrics that are routinely used to control the design and manufacturing processes. The databases are full of excellent, statistically valid data. Clearly, reliability assessment and product qualification should be based on all that data, rather than relying on an accelerated life test of a small sample.

A qualification methodology should have the following attributes:

  • It should be a standard methodology--not necessarily a standard test.

  • It should be concurrent with other engineering activities--not a final pass/fail gate.

  • It should be easily applicable to a given design--rather than being a standard test.

  • It should be easily portable across processes--not requiring reinitialization of all steps taken to date.

  • It should be quick and cheap--not requiring months of the design process and tens of thousands of dollars.

  • It should be based on understanding of reliability--not the lack of it.

  • It should be based on all data sources--not just a single life test.

Given these principals for a new qualification methodology, here are some elements of a practical approach to deep submicron product qualification:

Recognize the "bath tub curve" reality Each segment of the reliability bath tub curve, with its own mix of dependencies on process, design, and stress parameters, should be addressed appropriately and separately.

Recognize the "wear-out" reality Wear out, due to intrinsic material wear by excessive mechanical or electrical stress, is an attribute of a failure mechanism and materials involved. It is best managed through a set of design rules and process controls.

Recognize the design-reliability interactions Reliability issues that are a function of IC design and layout should be assessed through methods best suited for this; for example, reliability rules are better verified through design simulations than by life testing a product.

Recognize the process-reliability interactions Reliability issues that are an attribute of the processes should be addressed through process management, such as defect control.

Table 2
Advanced qualification evaluation
Process Qualification Product Qualification Family Qualification
Life Test Vehicle Test Device Vehicle Every Product Representative Chip 'SEC'
What is Qualified Process Product Product Family
Measured FIT Rate None Individual Envelope
What is Controlled Process Everything Library & Process
Cycle Time Impact Low High Low
Cost Impact Low High Med
Reliability Risk High Med Low
Compatibility with Process Changes Low Poor Med
Compatibility with Design Changes None Poor Med

Table 2. The relative strengths and impacts of qualification methods on production.

Recognize the screen-reliability interactions Reliability issues that are really escapes from the standard test and screen procedures should be assessed through metrics focused on these screens. Note that no amount of life testing will describe the real system-level problems experienced by a product that is tested with a low fault coverage test.

Life tests to characterize time dependency Life tests should be performed to characterize the time dependency of reliability defects--an attribute of the process--not of specific products. You can life test any piece of silicon to get information about process defect density.

There are a number of tools and data sources, other than the conventional life test, that can be used to qualify a product. The methods that leverage these sources of information are, in fact, a better, faster, and cheaper path to qualification than a product life test.

What can be done in the design arena IC failures can be classified as either intrinsic or extrinsic faults. Extrinsic faults are driven by an interaction with some defect, such as particulate contamination or mouse bites, while intrinsic faults are driven by overstress of a feature.

At Cadence Design Systems (San Jose, CA), we've developed a methodology for control of intrinsic faults. This "design for reliability" methodology is hierarchical and integrated, relying on a series of sieving steps that progressively take the description of the physical realities and verification steps to higher levels of design abstraction.

Intrinsic chip reliability is controlled by parameters found in current densities in metal lines, electric fields in gate oxide, and the electric field across a device. The physical phenomena--electromigration, oxide wear out, hot carrier aging, etc.--are aggravated by technology scaling and are normally managed through a set of design rules that describe the capability of a silicon process technology.

However, the nature of submicron products has eroded the margin between the application realities in a chip and the process limitations, as defined by these rules. Therefore, merely describing the phenomena by the design-rule limits is not adequate for deep submicron designs. A verification of compliance with these limits, and if necessary, a modification of the design to ensure compliance, is required. Without a formal verification procedure, the probability of design-rule violation is high, due to the sheer number of nodes and the higher intrinsic stress levels.

Thus, reliability phenomena, much like other physical realities such as timing, need to be checked and verified throughout the design cycle. Furthermore, if they are not performed early in the design process, there is considerable risk that they will require costly design iterations, which will negatively impact time-to-market and time-to-volume. In the case of timing--a product attribute that is described via many simulation engines--the iteration cycles are typically completed through physical design or first silicon. In the case of reliability, the iteration cycles can stretch all the way through field use.

Physical realities--dc design rules The characteristics of a physical mechanism such as electromigration, dielectric wear out, or hot carrier aging are evaluated by performing life tests on a set of suitable discrete test structures, such as individual metal lines, oxide capacitors, or individual transistors under accelerated conditions. The distribution of failures through time is then described through a statistical function. The failure mechanism dependence on environmental factors--current density, E-field, and temperature--is described through an acceleration model. These models are used to derive a physical design rule limit that corresponds to desired reliability performance. The design rules then define the maximum dc current density or a maximum dc applied voltage that a given technology feature can support.

Application realities--ac design rules Individual features in CMOS ICs do not experience dc stress but are subjected to some form of a switching stress. To verify compliance with the reliability design rules, some relationship between the dc rules and ac application needs has to be defined. Design rules for ac are extracted through characterization of the failure mechanism under switching conditions, extensive simulations of various circuit types, and sheer experience base.

Circuit application rules The verification of compliance for a complex IC is not trivial. Given the layout information (R and C on every node), knowledge of the circuit (I and V on every node), and applications conditions (f and T), the verification could be performed through Spice simulation of the circuit. Clearly, this is not a practical proposition for a million gate design.

Figure 1. Conventional qualification, based on product life tests, is a serial gate to volume shipments.

Therefore, if the reliability design rules are to serve any purpose other than legal liability coverage, a verification methodology at a higher design level must be defined.

Since million gate designs require some form of a hierarchical design methodology, a hierarchical approach to verification can be defined. Extensive Spice simulations of the building blocks within the application environments are used to derive a series of points that represent the design rule limits for every driver size and application temperature. Regression analyses of these points are then used to describe a relationship between various circuit level factors (tr, tf, I, V, f, T, R, C) that are consistent with the reliability design rule. These regressions are then used to derive circuit application rules. Formal reliability verification at the circuit design level can then be done using these circuit application rules.

Note that this type of verification needs to be performed at every level of design hierarchy, starting at the gate level and working on through to the full chip. Each stage of verification defines an application limit for the next level of design. Thus, at a gate level, a safe operating limit corresponding to the reliability design rules is defined in terms of input and output conditions (f, rise and fall time, etc.). At the next integration level (chip or block level), the verification consists only of verification that each gate is used within its defined safe operating limits. This methodology can be applied at the circuit design level with the usual R and C estimations, as well as after the physical design when more accurate R and C values are available.

Note that the safe operating limits and the circuit application rules are really approximations; therefore, they cannot be applied to identify failing nodes definitively. However, they can identify nodes that are not at risk. The remaining nodes only merit full Spice verification following the physical design. Using this sieving approach reduces the number of nodes that need to be formally verified after physical design from hundreds-of-thousands to merely tens.

Logic application rules Given some form of structure in the design tools and processes, it is possible to further extrapolate the circuit application rules into a set of approximations that can be applied at the logic design level. A number of trial routes are performed and analyzed so that an approximate characteristics of an average route are defined. These characteristics are then used with the circuit application rules to define logic application rules in terms of fan-in and fan-out limits. The logic application rules then dictate the limitations on interconnections between various logic blocks or cells for a given floorplan, and the rules represent another level in the sieving process.

Figure 2. Advanced qualification, based on design and process data, is a parallel activity that does not gate time-to-money.

Note that the logic application rules are even more approximate than the circuit application rules. They identify nets that are not at risk, or, conversely, identify nets that are at risk, so that they can be either redesigned or verified more accurately at the circuit or physical level.

The design for reliability process described here starts at the lowest physical level, characterizing the failure mechanisms, and it progresses to the highest level, with a set of logic design rules. The verification steps outlined here are all captured either in automated rule checker tools, or they are defined as a set of simple check-list practices that can be followed through design reviews at every stage of design release.

Conclusions It is clear that the current reliability qualification procedures are deficient, expensive, and not viable for submicron products. It is also clear that the reliability has evolved from art to science and that the industry must decide on a viable approach to qualification procedures. The new qualification standard will have to leverage a number of available tools and data sources, other than product life tests, to reliability engineers. Many companies have recognized this issue, but establishing a new standard requires across-the-board buy-in.

The methodology described here not only ensures product reliability, but also optimizes the cost and cycle time of the effort. The verification procedures are not just a pass/fail gate at the end of the design cycle, they are also tools that facilitate design for reliability. This design for reliability approach is a part of advanced IC product qualification methodologies that minimize reliance on conventional product life tests and maximize the use of other data sources. 3,4 These methodologies have been proven and validated through the very successful field performance of hundreds of thousands of parts operating in thousands of computer installations.

References

  1. R. Radojcic, "Generic Qualification of ASIC Products," MicroElectronics & Reliability , Vol. 26, No. 3, pp. 471 -479, 1986.

  2. M. Pecht, "Keynote Presentation," Proc. IEEE Integrated Reliability Workshop, Lake Tahoe, CA. October 1995.

  3. R. Radojcic, "Universal Qualification of SuperAsic Products," Proc. IEEE Integrated Reliability Workshop, Lake Tahoe, CA. November 1992.

  4. L. Oshiro, R. Radojcic, "A Design Reliability Methodology for CMOS VLSI Circuits," Proc. IEEE Integrated Reliability Workshop, Lake Tahoe, CA. October 1995.

Riko Radojcic is an architect at Cadence Design Systems (San Jose, CA).

To voice an opinion on this or any Integrated System Design article, please e-mail your message to michael@asic.com.


integrated system design  September 1996



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com e-mail cam@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome
Copyright © 1996 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About