As described earlier, once burn-in failures have reduced to a certain level by progressive manufacturing process improvements, the process can be dropped completely. Standard IPC-9592 for example allows this after one year or 30,000 unit-hours if the maximum failure is from zero to 400ppm depending on the product type. This can be considered only if the manufacturing process is entirely predictable and the quality of bought-in material is such that it has a very low rate of latent intrinsic defects. In other words, the bought-in components themselves don’t exhibit significant infant mortalities and only have their intrinsic low-level latent defect rate.
Although commodity components approach this quality level and modern manufacturing quality control can minimize process variations, there is still a real risk that a customer may see some early life failures. The cost of this in terms of lost goodwill has to be weighed against the cost of burn-in. Remember that customers will still see the intrinsic failure rate of the product in its service life.
A small extra number of failures attributable to infant mortalities may not be significant. For example, one product from Murata Power Solutions that uses quality components is built using a stable, mature process without burn-in and has an observed field mean time to failure (MTTF) of more than 25 million hours. This figure is derived from 130 failures in the total sales of 4.37 million parts shipped regularly over six years. In this case, it is assumed that the parts are powered for 25 percent of any given period, that only 10 percent of failures are actually reported and all shipped parts are still in the field. This represents a creditable defect rate of 30 parts per million over all units shipped to date and is a justification for a ‘no burn-in’ model. Note that some manufacturers count defect rate (dppm) as failures on delivery or within a short time of delivery. It can make sense to define a time as of course a cumulative defect rate for any electronics approaches a million parts per million after a long enough period!
On-going reliability tests are normally only used when there are large quantities of units built on a continuing basis and can give an estimate of the intrinsic reliability of a product during its service lifetime, that is, MTBF. The accuracy of this figure depends on the failure-rate acceleration during the test having a known relationship to the real-life failure rate.
The ‘Arrhenius’ equation can give a value for the acceleration factor given a constant failure rate after infant mortalities. This equation has its origins in chemistry, so in theory, it requires a knowledge of effective “activation energies” for all failure modes. Historically, the ‘rule of thumb’ has been to double the acceleration factor for each 10°C rise above the real life operating temperature. As an example, 50 units running for six months at 70°C with no failures gives 219,000 operational hours. From statistical tables, this represents a failure rate (λ) of 4110 failures in 109 hr of operation (4110 FITs) with a 60 percent confidence level or 10,502 FITs with 90 percent confidence.
At a lower temperature of say 40°C, our rule of thumb for an acceleration factor to 70°C is eight, so the figures reduce to 514 FITs and 1313 FITs. FIT is λ#x 109, and MTBF is 1/λ, so these figures represent an expected 1.95 million hours or 760,000 hours MTBF at 60 percent and 90 percent confidence levels respectively. It may seem odd that a test with no failures gives a finite failure rate. This is because it is assumed that the first failure is just about to happen. It should be emphasized that real field failure rate is the most accurate measure of the reliability of a product.
A calculated MTBF can be compared with the demonstrated figure obtained through life testing to check for consistency. However, the calculations can be misleading depending on the base failure rates used for components and the method of calculation. A survey by Murata Power Solutions found a variation of a factor of more than 100 between MTBF figures for the same circuit calculated by several different power supply manufacturers.
Different standards such as MIL-HDBK-217F and Telcordia SR332 will also give different answers. In addition, the MIL standard also gives two different calculation methods. One method is the ‘parts count’, which gives a quick but conservative measure, and the other is the ‘part stress’ method, which requires detailed knowledge of the electrical operating conditions. The latter method is more realistic. As an example of a ‘part stress’ calculation according to MIL-HDBK-217F, a general-purpose diode has a failure rate per million hours λP, given by:
where λB is a base failure rate for different types of diodes and the ? factors are for temperature, electrical stress, internal construction, manufacturing quality and environment of use respectively. For a Schottky power diode operating at a junction temperature of 80°C, with a voltage stress of 75 percent of its rating, metallurgically bonded construction, plastic commercial packaging and operated in a “ground benign” environment, the part failure rate calculates to be:
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.