datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Electromigration for Designers: An Introduction for the Non-Specialist

J.R. Lloyd

4/12/2002 12:00 AM EDT


Reliability is as much a key to success in the microelectronics industry as is performance. Not only must a product perform as desired, it must also work for an extended period of time without fail, typically 10 years or more. It does little good to make the world's fastest microprocessor, if after two weeks of operation it fails. Except for very few applications, such as missile guidance systems that only operate for a few seconds, anything other than superb long-term reliability would be unacceptable.

With the complexity of today's microelectronics, a phenomenal level of reliability must be maintained. For instance, if the probability of failure for a transistor is one in a million, and you have a million transistors, failure is very near certainty. And yet, a modern IC can have more than 10 million circuit elements. Therefore, for any acceptable reliability on the chip level, today's circuit elements must be among the most reliable things ever built. In addition, reliability must continue to increase as the complexity increases.

The reliability we have enjoyed thus far has not come without considerable cost. Billions of dollars and the equivalent in Yen, Francs, Deutschmarks, and so on have been expended to solve the daunting problems facing reliability engineers designing integrated circuits. The few wear-out failure mechanisms that exist (hot carrier, time-dependent dielectric breakdown, and electromigration) have become understood well enough that we can incorporate them into design tools.

We know the limitations to apply in order to delay any wear-out issues to long past the useful life. However, to apply the limitations effectively, one must understand the limitations of the materials used to manufacture ICs, and work around them. Overestimating the capabilities of the materials and the process could spell disaster, and underestimating them could limit designs so severely that nothing of commercial interest could be made. Striking a balance between conservatism and judicious use of the process capabilities is necessary for continuous advancements.

ICs must work rather hard. High currents, high temperatures, and many thermal cycles eventually take their toll. Just as any mechanical device, like an old car, boat or airplane, eventually fails from repeated exposure to the everyday stress of operation, electrical stresses cause similar problems in electronic components. Two types of reliability issues plague the industry: defect-related problems and wear-out. Defect-related problems are caused by manufacturing defects, such as a missing process step, dirt, or other unavoidable calamities. Even the best, most efficient process lines suffer from an occasional defect related problem. Wear-out is due to the circuit or the product just wearing out, without any initial defects being present.

Although redundancy and insensitivity to a failure mechanism may be up to designers, defects are in the realm of the process engineer. Improved processes and statistical process control efforts often reduce such failures to a minimum. Wear-out, on the other hand, which occurs due to limitations in the "perfect" material, is a problem that lies squarely with the designer. One of the principal wear-out failure mechanisms is electromigration. Fortunately, although not completely understood in all its subtleties, it is controllable by proper design and a firm appreciation of where one can get into trouble.


Electromigration History

Electromigration is the mass transport of a metal due to the momentum transfer between conducting electrons and diffusing metal atoms. Discovered more than 100 years ago, it became a concern only when the relatively severe conditions necessary for operation of integrated circuits made it painfully visible. Although electromigration, in principle, exists whenever current flows through a metal wire, the conditions necessary for electromigration to be a problem simply did not exist back then. In bulk wires, such as those used for home circuitry, the maximum current density is to about 10,000 A/cm2 due to Joule heating. Any current density even modestly exceeding this value will produce enough heat to melt a metal wire; however, the driving force from electrons colliding into diffusing metal atoms would be insufficient to make electromigration a significant problem. Only a research scientist would pass enough current through a bulk metal wire to observe the effects of electromigration, and only with great experimental difficulty. Therefore, for at least 100 years, electromigration was an interesting problem in solid state physics, fascinating grist for the research mills at universities, but of no interest whatsoever commercially.

All of this changed in 1966 when the IC made its commercial appearance. Electromigration was rediscovered by a much larger audience, and with a vengeance. In ICs, electricity is conducted via thin film stripes that are in direct contact with an effective heat sink. Because most of the heat generated by the current is conducted away into the chip, thin film conductors can withstand current densities at least two orders of magnitude greater than traditional bulk wires. This allows current densities of nearly 106 A/cm2 with minimal Joule heating. At these current densities electromigration becomes significant.

The first ICs were constructed with metal lines that were 10 mm in width or more—wide by today's standards. At the same time they were exceedingly thin, on the order of 3000A. Furthermore, the conductors were made of pure Aluminum, a material with a low melting temperature, which implies fast diffusion at low temperatures. Very thin film contains small grains and thus many grain boundaries that are conduits for even more rapid diffusion. This combination of high current density and fast diffusion at low temperatures was a recipe for disaster.

 
"A billion here and a billion there, pretty soon……."

At IBM it was estimated that close to a billion 1966 dollars were spent in the effort to understand and fix the problem of electromigration failure. This was when a billion dollars was a lot of money.
 

ICs were supposed to be very reliable and great hope was placed in their use. When the first ICs were placed into service, they failed within weeks. The shock to the industry was tremendous. IC manufacturers were in a panic to understand why they failed.

When parts returned from the field were subsequently examined, there was nothing visible, even under a microscope. A relatively new research tool, the scanning electron microscope, was used and failure sites were identified. The open circuits were very fine "cracks" in the metal, sometimes only a few hundred angstroms wide. When the culprit was identified, the immediate fix was simple: make the metal thicker. Easy with 10 mm wide lines, but not so easy today.

Since then, electromigration has not gone away, but it has come under control. The first solution was to make the metal conductors more resistant to electromigration by alloying the Al with Copper (Cu), initially up to 4%. This has changed due to processing considerations but today generally 0.5% Cu is still alloyed with Al. The addition of Cu, of course, had a deleterious effect on the resistivity and low resistance was available only by using relatively thick metal, 1.0 mm or so. Today, fine pitch circuits cannot tolerate such thick metal, and other schemes are used to insure reliability.


The Physics of Electromigration

Electromigration is due to the momentum exchange between conducting electrons and diffusing metal atoms. Simply stated, perhaps, but how does it happen?

 
Designers Beware.

Many reliability engineers working in electromigration define the current exactly opposite to the way you do. To them current is electron flow and positive current flow is in the direction the electrons are traveling.

Ben Franklin had a 50-50 chance of getting it right.
 

In a perfect lattice, there is no resistance. Electrons move about in a periodic potential with no other interaction with the metal atoms. This may sound like superconductivity, but it isn't. The problem here is that a perfect lattice cannot exist above absolute zero due to missing atoms ("vacancies"), impurities, boundaries between crystals of different orientation ("grain boundaries"), and regions of imperfection ("dislocations"). Perhaps even more important, at any temperature above 0ºK, atomic vibrations occur. These vibrations ("phonons") put a metal atom out its of perfect position about 1013 times each second and disturb the periodic potential, causing electron scattering. The scattering event makes the electron change direction; any change in direction is accompanied by an acceleration; and for every acceleration there is a force. After many collisions (another word for the scattering event), the force averages out in the direction of electron flow.

The force due to collisions of electrons to metal atoms is called the momentum exchange. In electromigration, momentum is exchanged between the electrons and the metal atoms and a change in momentum with time is called a force. To provide sufficient momentum exchange to cause measurable effects, many electrons must be available to collide with the atoms. This can only happen in a metal. In metals, many electrons are easily accelerated in an electric field.

 
Sign of the Charge Carriers

Heavily doped polycrystalline silicon was used to illustrate an interesting property of electromigration physics. Both p-type and n-type polysilicon resistors doped to approximately 1% were stressed until failure in strong Joule heating induced temperature gradients. In the n-type material, failure was near the cathode and in p-type material failure was near the anode, thus demonstrating the role of the sign of the charge carrier in electromigration.
 

Semiconductors have far fewer electrons and in a true semiconductor, electromigration does not exist because there just aren't enough charge carriers. However, electromigration can occur in semiconductor-like materials, such as silicon, when they are so heavily doped that they act as if they were metals. At dopant levels of around 1%, electromigration has been observed in polycrystalline silicon, but then the temperature coefficient of resistance (TCR) is positive. A positive TCR is probably the best definition of a metal.

The size of the momentum exchange will be proportional to the distortion in the lattice at any given point. This distortion is greatest when there is a vacancy nearby, or in the region of a grain boundary. This is also where diffusion occurs. Vacancies or grain boundaries must be present for metal atoms to move from their fixed positions in the crystal lattice ("diffuse"). You can't have two things in the same place at the same time, so for an atom to move from site A to site B, site B must be vacant. In grain boundaries the problem is less well defined, but the concept still applies. However, a boundary is a region of distortion and open space, and the diffusion of atoms can be accommodated in these regions rather easily as compared to the lattice. This creates a fortuitous situation where the greatest momentum exchange occurs only at the sites where it is possible for atoms to move.

For the design engineer, electromigration physics can be simply stated. Electrons flow through a metal film and collide with metal atoms. The collisions produce a force on the metal atoms in the direction of electron flow (for n-type materials, opposite for p-type materials). Electromigration is only significant at high current densities and only in metals. The magnitude of the electromigration force is proportional to the current density.


Materials Science

The flux of metal atoms due to electromigration can be expressed rather simply, using an electrostatic analogue and Einstein's equation for diffusion in a potential field.

where J is the atomic flux, D is the diffusion coefficient for the appropriate mass transport mechanism, Z* is a quantity called the effective valence or the effective charge (although it is neither a charge nor a valence) that represents the sign and the magnitude of the momentum exchange, r is the resistivity and j is the current density. kT is the average thermal energy per atom. The important observation from Equation 1 is that the electromigration-induced mass flux is directly proportional to the current density, to the diffusion coefficient and to the concentration of diffusing atoms.

Just having an electromigration-induced mass flux is not enough to cause a problem. For a problem to exist, either more or less mass must be entering a region than leaving it. If more mass is leaving than arriving, we can form voids and open circuits. If more mass is entering than leaving, extrusions will form short circuits or breaks in the passivation and provide an opportunity for corrosion. These regions are called flux divergences. Unfortunately, many opportunities exist for flux divergences in a typical IC.

A principal source of trouble is in the unavoidable contact to silicon. The diffusion of Al from Silicon (Si) is zero, and, hopefully, the diffusion of Si into Al is the same. Therefore, since electromigration will be driving the Al away from the Si contact and attempting to stuff it into another, a serious problem can result. Under the right circumstances, metal atoms will leave and none will replace them, so voids will form at contacts where electron current is entering the metal from the Si. Conversely, extrusions will be generated where the electrons are entering the Si.

Since contacts and other similar structures are unavoidable, the potential for electromigration failure exists in any real circuit. All we can do is design our circuits such that this inevitable problem is delayed until it no longer matters—and this is the circuit designer's responsibility.


Effect of Current Density on Conductor Lifetime

 
Black's Law

In the late sixties, Jim Black of Motorola was heavily involved in understanding the "cracked stripe" problem that was later identified as electromigration. Jim's pioneering work included the first careful systematic investigations of electromigration failure kinetics. His experiments uncovered the curious behavior that electromigration failures followed kinetics that depended not on the inverse of the current density, but on the inverse square.



where t50 is the median time to failure in an ensemble of samples, A is a constant that needs to be empirically determined and DH is the activation energy for failure. The experimental values found for the activation energy suggested grain boundary diffusion as the mass transport mechanism. For nucleation dominated failure, this equation has proven to be adequate even to the present day. Only small corrections, often too small to be detected experimentally have been needed to keep Black's Law consistent with the latest theoretical developments.
 

From Equation 1 we see that the electromigration driving force is proportional to the current density. It could be assumed that electromigration failure would scale in the same way—linearly with the current—but that is not always the case. Traditionally, it has been observed that electromigration failure followed a 1/j2 law rather than 1/j. This has become known as Black's Law. However, whether this empirical law holds or not depends entirely on whether the failures are nucleation or growth dominated. This, in turn, depends heavily on the process used to construct the metal lines. If there is no refractory "shunt layer" such as TiN or TiW under the Al line, failure is nucleation dominated and Black's Law holds. If, however, the failures are growth dominated, such as is usually the case for W via failure in narrow lines with shunt layers, Black's Law is not followed and failure times are dependent on 1/j kinetics. Often, as might be expected, the failure process involves both nucleation and growth of damage, and the behavior is more complicated and cannot be described by a simple power law in j.

Wherever growth dominates or is a significant part of the failure time, we assume that 1/j kinetics hold. Most recent experimental data where contacts or vias have been examined in the presence of refractory conductive shunt layers has supported the use of 1/j kinetics, whereas most data on conductor lines attached to bond pads has supported 1/j2 kinetics.

To ensure that electromigration failure does not occur in the field, we need to limit the current density such that electromigration failure will not become significant until long after the projected useful lifetime of the circuit. This is a function of not only the current density in the metal lines and contacts, which may behave differently, but also of temperature and often process variations.


Effect of Temperature on Current Density Limits

The major effect of temperature on electromigration is in the diffusion coefficient. Diffusion is a thermally activated process characterized by the Arrhenius relation and it possesses an activation energy.

 
Activation Energy

The activation energy for self diffusion depends strongly on the diffusion mechanism. Diffusion can proceed through the lattice, or grain boundaries, and along interfaces or the surface. The lattice is the most difficult path with the highest activation energy (for Al DHlattice is about 1.4 eV), followed by the grain boundary (for Al, DHgrain boundary is about 0.6 eV ) and then the surface. In Al, the surface is generally not available due to the presence of a coherent oxide film. Interfacial diffusion activation energies differ for every interface and can be either greater or less than that for grain boundary diffusion. Adding alloying elements generally has the paradoxical effect of decreasing the lattice and increasing the grain boundary activation energies. The effect on interfaces is unclear.
 

where D0 is a pre-exponential factor that depends on the diffusion mechanism and DH is the activation energy, also dependent on the diffusion mechanism.

Equation 3 shows that electromigration is very sensitive to temperature. For Al, generally a change in temperature of 20 degrees can double the rate of electromigration. Therefore, the current permitted in a thin film conductor is a function of temperature. The higher the temperature, the less current can be permitted and still remain safe from electromigration failure.

Just how much current can be permitted and still maintain reliability as the temperature is changed will depend on whether you have nucleation or growth dominated failure and what the dominant diffusion mechanism is. If we have growth-dominated diffusion and we increase the temperature such that we double the diffusion coefficient (approximately 20 degrees for Al alloys and grain boundary diffusion), we must reduce the current density by half. Conversely, if we want to increase the current density by a factor of two, we must ensure that the temperature is at least 20 degrees cooler. If failure is nucleation dominated, an approximate 30% reduction in current is needed for a similar temperature increase to maintain equal reliability.

Whether failure is nucleation or growth dominated is a matter of the process used to deposit the metal and the overlying dielectric. Almost everything that happens consists of an initiation followed by a continuation. Electromigration is no exception. First the damage must be initiated, a void nucleated or an extrusion formed, then the damage proceeds, such as void growth or continuing the extrusion, until failure occurs. Sometimes nucleation is slow and takes a long time and growth is fast. When this happens we have nucleation dominated failure. Sometimes we have the converse, and the nucleation is either very short or non-existent, and we then have growth-dominated failure. Electromigration exhibits both types of behavior.


Nucleation Dominated Failure

Nucleation-dominated failure will be most common in processes that do not contain a redundant "shunt" layer. Void nucleation occurs when sufficient stress is generated. To generate stress, significant mass transport must take place. This takes time. At a critical stress level, a void will form to reduce the stress in the system. When the void forms, a tremendous release of strain energy occurs that promotes very rapid void growth. In the absence of a shunt layer, an open circuit develops almost immediately, and failure follows 1/j2 kinetics. At least two other nucleation dominated failure mechanisms have been identified: the stress buildup following Cu depletion in Al/Cu alloys, and passivation cracking induced by compressive stresses which produce extrusions. In all three scenarios, 1/j2 kinetics prevail.


Growth Dominated Failure

If there is a redundant shunt layer, the initial rapid growth of the void will not produce an open circuit. The shunt layer, usually of a refractory material such as W or TiN, can conduct electricity even if a void exists in the primary Al conductor. These metals can withstand extremely high current densities at high temperatures for very long times. If failure is defined as an open circuit, they don't fail. However, for most realistic situations, an open circuit is not a realistic definition of failure. Since a resistance change of about 10% in global wiring can produce timing errors, the 10% increase has often been chosen as a failure criterion.

Using a percentage increase as a failure criterion during a test has some problems. The actual damage that causes a failure will be a function of the precise geometry of the test structure and the initial resistance. This is unsatisfying for evaluating real circuits that don't look like test structures. It is recommended, therefore, that failure criteria be based on an absolute change in resistance, the maximum that a particular circuit can withstand before problems arise.

It is necessary to use test structures that can measure a resistance change without geometric effects, such as the Blech Length to affect the data.

 
The Blech Length

In the 1970's Ilan Blech of the Technion in Israel performed one of the most important series of experiments in the history of electromigration science and technology. In these experiments he had created a test structure that consisted of islands of gold (Au) deposited onto a refractory underlay. When current was passed through these samples, the upstream side of the islands moved in the direction of electron flow and the downstream edge stayed stationary. If the island was long enough, extrusions formed on the downstream edge, but if the island was short enough, electromigration essentially stopped. Electromigration also stopped when the longer islands shrunk to a critical level. He discovered that there is a critical product of the current density and the length of the island, below which electromigration ceases. This is the origin of the "Blech Length." For any given current density, there is a length below which electromigration will not occur.

This behavior occurred because a mechanical back stress, generated by electromigration, resisted the electromigration force. The back stress exists only in the presence of a flux divergence and it is greater in the presence of a mechanically strong confining passivation layer. For this reason, the Blech Length cannot be easily pre-determined. It is a strong function of the process and the physical design of the chip.

In principle, one could make a circuit immortal by designing all the lines to be shorter than a Blech Length. However, the Blech Product jxl is only on the order of a few thousand and is a strong function of the thermal history, so this idea has not been seriously considered.
 

The growth of a void depends on the rate that metal atoms leave the void, or, equivalently, the rate at which vacancies enter it. The flux of vacancies or atoms is linearly dependent on the current density, and therefore the time required to attain a certain void size will obey 1/j kinetics. Care must be taken in experimental measurements, however, since inappropriate test structures can result in just about any value for the current exponent.

For a given metallization, growth dominated failure must take longer than nucleation dominated failure, since the damage needs to nucleate before it can grow. However, the nucleation phase can be very short, approaching zero. The kinetics of failure must be evaluated experimentally and applied properly. This means that for electromigration damage in real conductors, we can have either 1/j or 1/j2 kinetics. It has been observed that for wide lines, defined as those where the average grain size is smaller than the line width, 1/j2 kinetics usually dominates, whereas for narrow lines, 1/j kinetics dominate.


RMS Current and Temperature Gradients

When current is passed through a conductor, the interaction of the electrons with the lattice produces a thermal energy equal to the product of the square of the current and the resistance. This is called Joule heating. Metal lines will heat up whenever current is passed through them. If the current is low, the heat is effectively conducted away, but there must be some temperature increase even if it is not detectable. If the current density approaches 106 A/cm2, Joule heating can produce enough energy to make the conductor lines heat up appreciably. At first this does not appear to be a problem, since current densities are almost always lower than this due to limitations induced by electromigration. However, one must realize that Joule heating is caused by root mean square (RMS) current and not by the average current, as is electromigration. For a narrow pulse, the RMS current can be much higher than the average current. The average current can be well within any guidelines that may be set for electromigration considerations, yet significant Joule heating can result. This can be more prevalent on upper level metallization, where heat must be conducted through several layers of interlevel dielectric, which is a poor thermal conductor.

The problem with Joule heating is not the modest temperature increase, but the temperature gradients that result. Typically, at the current densities found in modern circuitry, temperature increases would range between a few and a few tens of degrees Celsius. This produces temperature profiles that decay within a few microns, so that temperature gradients of 104 to 105 degrees Celsius/cm will be found. Since electromigration is thermally activated, the temperature gradients produce flux divergences that approach that found at absolute divergences such as at contacts or at microstructural features.

RMS current density must then be limited to about 2 x 106 A/cm2 for lower level lines and about half that for upper level lines. Unfortunately, the reliability of metal lines in the presence of temperature gradients cannot be accurately estimated. Temperature gradients can vary tremendously throughout a real structure, depending on subtleties of the geometry and on the use of the underlying silicon devices. The only way to deal with these issues is to take a conservative approach and forbid temperature gradients by limiting the RMS current density to the levels suggested above.


Microstructure and Electromigration: Line Width Effects

 
Al/Cu

One of the first applications of electromigration engineering to solve reliability problems came about 1970. At that time, thin films were usually deposited by the high temperature evaporation of metal films. Legend has it that when IBM was trying to solve the electromigration problem, one evaporator was producing better material than any other. It was a mystery. After weeks of study, someone found out that the electron beam used to melt the Al used for the conductors was misaligned. Instead of impacting directly onto the Al charge placed in a Cu container for that purpose, the e-beam was hitting the Cu and causing some of it to melt and be deposited along with the Al. The resulting Al/Cu alloy proved to be remarkably resistant to electromigration failure, increasing the median time to failure by more than an order of magnitude. It was determined that Cu slowed down the diffusion of Aluminum in the Al/Cu grain boundaries. After this effect was understood, it was exploited.

This, however, did not eliminate electromigration failure, but served as a band-aid until the technology caught up with the capabilities of Al/Cu. However, the use of Cu was a great breakthrough in electromigration technology, buying several years of performance and making the high performance IC possible. Today we live within the limitations of Cu in Al by making intelligent compromises and choices. Searches for other alloys in a process reminiscent of alchemists looking for the Philosopher's Stone have not turned up anything that works better.

Sometimes you just get lucky!
 

Electromigration is a form of mass diffusion, where the driving force is provided by the electron flow. Therefore, things that affect diffusion will affect electromigration. Metals are composed of atomic crystals where atoms are lined up very nearly perfectly in only a few allowable configurations. The size of these crystals ("grains") is finite. Where the grains meet, they form a region of disorder ("grain boundary"), and provide a pathway for easy diffusion as compared to the nearly perfect metal lattices.

In the early days of ICs, the thin film conductors used in manufacturing were relatively wide, fine grained, and composed of many grains. These were referred to as polycrystalline. The grain size was about the thickness of the film, generally about one micron. Across the width of a typical conductor several microns wide, many grain boundary pathways were available to accommodate the electromigrating atoms. It came as no surprise that electromigration failure was inversely proportional to the grain size of the films: the more grain boundaries present, the more atoms that can be transported along them, and the earlier the failure time.

As line widths became smaller, the grain size of the metal films became larger. Conductor lines became comparable in width to the grain size and took on a "bamboo" like appearance where most of the grains spanned the line width, providing no continuous grain boundary pathway in the direction of the current flow. When this occurred, a peculiar effect was found: failure times were strongly dependent on line width. Narrow lines at the same current density became substantially more reliable than wider lines, as long as the grain size was uniform.

The reason for this behavior was not hard to figure out. The lack of easy grain boundary pathways meant that the atoms had to take more arduous paths such as the lattice or various interfaces in their journeys. The activation energy for failure was found to be a function of line width, since the diffusion process changed. What became even more interesting and important to reliability engineers was that the precise arrangement and orientation of the grains had a large effect on the lifetime of the conductor. In fact, as the ratio of grain size to line width increased, the reliability became poorer before it got better, and then got worse again as lines entered sub-micron widths.

Today, we understand this behavior and can predict the reliability from test data, grain structure, and particulars of the metal deposition process. New effects, due to the presence of refractory shunt layers and W plugs, have surfaced and have also been explained well enough that they can be tamed. However, a fundamental understanding of the process of solid state diffusion and what affects it are essential in interpreting test results. For this reason, conservative default values for parameters used in relating electromigration test data to real circuits should be employed until careful testing and data interpretation justify a change.

The choice of test structures and test conditions are of critical importance in extracting meaningful parameters to be used in interpreting the test data as it relates to actual chip performance. The wrong test or the wrong test structure can produce fatal results. The test structure must be designed to reflect the process and usually a single structure cannot.


Optimizing for and Ensuring Reliability

 
Failure Distribution

The distribution of electromigration failures has recently been the subject of much discussion. Traditionally the lognormal distribution was used, where the logarithms of the failure times are normally distributed. But this has conceptual and practical problems, the most important of which is that the lognormal distribution is not extendable. This means that given an ensemble of n components and a lognormal failure distribution, if we make up a new ensemble of combinations of the components in series so that the weakest of these "links" produces failure, the resulting distribution cannot be lognormal. Mathematically, the probability of failure, Pf, for a chain of n links, given that the probability of failure of a single link is known, is:



If Pf (1,t) is lognormal Pf (n,t) cannot be for n>1. Therefore, this earlier way of estimating the reliability of n components must be incorrect.

We can estimate the value of Pf(1,t) from test structures and define that the chip would consist of n effective failure elements. Determining this is not a trivial exercise, however. The number of failure elements in a test structure must be estimated. The good news is that once we have defined what a failure element is, we can, in principle, decide what the probability of failure for each element is. The probability of failure of the chip then can be estimated more accurately than Equation 4 by substituting the failure probability for each element.



where Pi is the probability of failure for each failure element.
 

The challenge to IC designers is to ensure reliability while squeezing as much performance out of the process as possible. Unfortunately, the requirements for these two goals are conflicting. Higher performance means higher currents in smaller conductors, whereas reliability demands lower current densities.

In the past, the custom has been to generate design rules based on "worst case" scenarios. In this strategy, current densities were limited to a certain value assuming that all the lines on the chip were to be used at this high current density. This was patently silly. The limiting values were determined from extrapolating the failure times, usually fitted to a lognormal failure distribution, to some required level of reliability based on the chip complexity. This approach was too confining and designers of today's ultra-high performance microprocessors have begun to use a strategy known as "Reliability Budgeting." All one needs to do is calculate how much power is dissipated by a chip running with every wire at the electromigration limit. It is often kilowatts.

To perform reliability budgeting, we need to know how much current is going through each element. In today's complex microcircuits, this is a daunting task, but the payback is significant. The allowable current density for critical circuit paths can be increased substantially while maintaining reliability, since the majority of circuit elements have little to no current flowing though them and are thus effectively immortal. In addition, if Pi can be located in the circuit, trouble spots can be eliminated and a more reliable circuit can be designed.

Great care must be taken to ensure that the information fed into the calculation of Equation 5 is correct. If the failure statistics are incorrect, or the input parameters such as lifetime and current exponent are wrong, a disaster can unfold. However, optimizing for performance and reliability can be done successfully and, in fact, the successful design and manufacture of high performance microprocessors has been possible only by employing some form of reliability budgeting.


Summary

Electromigration has been with us since the early days of solid state devices, even before ICs took center stage. Like an old soldier, electromigration never dies, and unfortunately it does not have the good taste to fade away. Whenever we "conquer" electromigration, we enter new regimes where the demands of increased performance require that interconnect be more and more reliable under conditions where metallization is inherently less reliable. The promise of developing future metallization schemes that will erase the problem has so far eluded us and there is no guarantee that the future holds a panacea. Copper may help a little, but not nearly enough as was hoped for and it still only buys a little time. Eventually the capabilities of Cu will be seriously challenged, and this is assuming we can solve the daunting processing problems that have confronted us over ten years of development.

Recent advance have given us hope that although electromigration will always exist and cause problems, we can control it such that advanced microcircuits can still be designed with the reliability we need. The use of reliability budgeting, if coupled with a detailed knowledge of manufacturing process capabilities, can allow advances without compromising long-term performance. This complicated task can only be accomplished with the right tools and talents.

Electromigration as a design issue will be with us until we develop a room temperature superconductor with a critical current density of millions of amps per square centimeter that is compatible with semiconductor processing. Such a development is far in the future, and we must exercise diligence in controlling the beast and respect its potential.

Where is it written that life is to be easy?


About the Author

J.R. "Jim" Lloyd specializes in electromigration and metallization reliability for chip and packaging applications, reliability testing and analysis, qualification plans, and electromigration failure modeling. His industrial experience includes reliability engineering and R&D positions at IBM and Digital Equipment Corporation. In addition, he was visiting scientist at Max-Planck-Institut in Stuttgart, Germany. He has published more than 60 papers on semiconductor materials science and reliability engineering, has been invited to speak to audiences throughout the world, and has taught courses and workshops at Stevens Institute of Technology, New York Polytechnic, MRS, ASM, IBM, Digital Equipment, IRPS and ESREF (Europe). He holds the Ph.D., M.S., and B.S. degrees in materials science and engineering from Stevens Institute of Technology. He can be reached through email at jrlloyd@vinfiz.net.






Jim Lloyd

12/18/2008 2:47 PM EST

I just wanted to see comments to the article I wrote several years ago that you are using. How do I do that? Assuming there are comments

Sign in to Reply



Dave Martsolf

9/23/2010 10:19 AM EDT

Hi Jim,

I was interested in the electromigration possibilities of polysilicon as we have noted what could be ESD events that seem to grow poly tubes around a poly stringer, turning a weak (but passing our tests) parts into a failed part later. The aspects of how strongly doped pol;y must be in order to become responsive to varying degrees of current (both normal and abnormal) interest me. Thanks for your article.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)