Calculating the failure rate
Three methods can be used to calculate failure rates: prediction (during design), assessment (during manufacturing), and observation (during service life).
Prediction uses a standard database of component failure rates and expected life, typically MIL-HDBK-217 for military and commercial applications or Telcordia for telecom applications.
The MIL approach requires use of many parameters for the different components and includes voltage and power stresses, while Telcordia requires fewer component parameters and can also take into account lab-test results, burn-in data, and field-test data. Finally, the MIL approach yields MTBF data, while Telcordia produces FIT numbers (failures per billion hours).
Using these databases and techniques means several often incorrect assumptions need to be made, such as the assumption that the design is perfect, the stresses are all known, that everything is operated within its ratings, that any single failure will cause complete failure, and that the database is current and valid.
But, it is the least time-consuming method and, by applying it consistently across different designs, it can indicate the relative reliability of topologies and design approaches, rather than absolute reliability.
Conversely, assessment is the most accurate way of predicting failure rate, but requires greater time and resources. This method subjects a suitable number of final units to an accelerated life test at elevated temperatures, with carefully controlled and increased stress factors.
One method, the HALT (highly accelerated life test) approach, tests a number of prototype units under as many conditions as possible, with cycling of temperature, input voltage, output load, and other impacting factors. HALT testing seeks to fatigue a component, PCB, subassembly, or finished product through either intense stressing for fewer cycles, or through low-level stressing for more cycles.
A second method, HASS (highly accelerated stress screen) testing is an accelerated reliability screening technique used to reveal latent flaws not detected by environmental stress screening, burn-in, or other test methods. HASS testing uses stresses beyond initial specifications, but still within the capability of the design as determined by HALT.
The stresses in HASS are more rigorous than those delivered by traditional approaches, so HASS testing substantially accelerates early discovery of manufacturing-process issues. Reliability engineers can then correct the variations that would otherwise lead to field failures and greatly reduce shipment of marginal product.
Observation in the field is also possible, but this is more difficult as it is impossible to control all of the conditions a supply has been subjected to. Therefore it is more difficult to undertake reliable causation analysis.
Stresses that affect power supply reliability
Power supply life is affected by three kinds of stress: thermal, mechanical, and electrical. A quality design anticipates each of these and takes necessary steps to minimize both their occurrence and their impact.
Thermal stress takes two forms: static and dynamic. Static thermal stress, where supplies are operated at elevated temperatures, degrades components and their basic materials. Bulk capacitors may begin to dry out, or their seals may be stressed, and even resistor coatings may begin to deteriorate and break down. Interconnection and mating areas can expand and mismatch.
Dynamic stress is associated with the heating and cooling cycles and the resulting expansion/contraction, which leads to micro-cracks.
Mechanical stress severity depends on how and where the supply will be installed and used. This stress can cause both intermittent and hard failures, as cracks develop and circuit connections start to open and, in some cases, reconnect.
Electrical stress is any voltage, current, etc., that is applied to a device. Over-stress occurs when a component is operated beyond its rated value, either through poor selection or one-time events. For example, a capacitor may be rated to 100 VDC, but sees a 150 VDC spike in operation.
Improving power supply reliability through design
Obviously, the paper design and topology should be robust and cautious. This should take into account the effects of load and line transients, as well as noise. The designer should also carefully determine the required minimum/maximum values of component parameters to ensure reliable operation (a "typical" value is nearly meaningless), as well as those for critical second- and third-tier parameters, including less-publicized factors in the magnetic components, such as temperature coefficient of some values.
We've discussed the need to manage operational temperatures, and a thermal analysis of the design and its physical implementation is therefore critical.
SPICE (simulation program with integrated circuit emphasis) or similar modeling of the design is essential, using realistic, not simplified, models of the components and PC boards and tracks, to verify both static and dynamic performance. The choice of components must be done with a conservative bias, with an extra margin in both initial and long-term values for many of their specification values. Furthermore, the layout must accommodate the fact that most supplies are dealing with significant current flows, on the order of 10, 20, or more amps.
After design, the next critical step is selection of specific components. Since it's nearly impossible to distinguish a poorly made or counterfeit unit, vendor credibility is key. Furthermore, components must be compatible with the manufacturing process, with mounting tabs, sufficiently large connection points, and heavy wire leads, or screw terminals, where appropriate.
On the subject of design for manufacturability, even the basic soldering process used in supply construction is an area for consideration. While the common reflow-soldering temperature profiles are well established, the regulatory mandate for lead-free (Pb-free) components and solder also means that a different reflow soldering profile is needed and all components used must also be qualified to perform to specification after this higher reflow temperature and soak time.