Given the relative rarity of the battery problems with Samsung's smartphone, how do you determine the root cause with confidence?
We're all aware of the serious battery problems which occurred with the recently introduced Samsung Galaxy Note 7 smartphone, leading to its recall followed by complete product withdrawal from the market. While there has been some of the usual sniping at engineers by people who have no idea what product design and release actually involve, I'm pleased to see that it has been relatively muted. Perhaps the critics and comedians have been distracted by the US election, which certainly has provided more "theater" and possibilities for humor and smarmy sarcasm than a smartphone?
At this point, we don’t know the cause or causes of the battery fires (or explosions?). It could be a latent defect in the battery cells which constitute the power pack, or the ICs which monitor and measure the cells in their pack, or the embedded firmware which manages charge and discharge based on those readings and control algorithms, or it could be some combination of the three (there's a good early diagnosis from The Wall Street Journal here). Perhaps it’s a strange thermal occurrence due to insufficient dissipation in some situations. It's certainly premature to come to any conclusion, and the deep-down root cause may actually be an unfortunate confluence of events, or the problem may even have different causes even though they have the same outward appearance—a very common reality in troubleshooting.
The Samsung Galaxy Note 7 smartphone received favorable reviews, but has been pulled off the market due to documented cases of battery flaming and worse.
But the Samsung issue points to a major engineering challenge: how do you develop confidence in a design when the failure rate is so low? After all, the number of reported phone problems was actually quite low compared to the number of units sold. When you are at the tail end of that failure curve, how much longer and how do you test?
Nor is this problem is not unique to a phone. When you have a large number of the same precut in use, there will be some of the failures which may occur early in the usage, but only in a few units. It could be due to a build-up of tolerances in one direction, some semi-random anomaly, or other causes. Road testing 100 cars for 100,000+ miles is not the same as testing 100,000 cars for 100 miles. Different failure modes and shortcomings will appear for each use case.
The question is how do you test for the cause of something which is inherently a fairly rare occurrence? There are options such as Highly Accelerated Life Test (HALT), a stress-testing methodology for enhancing product reliability, and Highly Accelerated Stress Screen (HASS) which attempts to uncover weaknesses by applying multiple stress factors (see here and here for just two of many online references) but these tests cannot assure 100% confidence – no test ever can.
The Samsung issue has an interesting twist, in how they tested their batteries prior to release. According to a carefully-written article in The Wall Street Journal, "Samsung Self-Tested Batteries in Galaxy Note 7 Phone," there are 28 labs certified by the CTIA (the U.S. wireless industry’s trade group) to test batteries and ensure compliance with standards set by the Institute of Electrical and Electronics Engineers. However, Samsung uses their own internal labs for this testing—and whether that had anything to do with the problem is an interesting question to explore. It may be part or the problem, or perhaps not at all.
It's not just smaller batteries which have these issues, of course. Several years ago there were documented cases of Sony laptop PCs catching fire even when not plugged in, apparently due to internal shorts with the battery pack enabling high-current discharges from the high concentration of energy in the lithium batteries. Even bigger power cells have hard-to-discover fault issues, such as the batteries on the Boeing 787 Dreamliner which grounded the aircraft for nearly a year. Part of the eventual solution involved strengthening the battery case and made more heat resistant, and adding an outside vent path to the fuselage—not a pleasant fix.
Have you ever tested something "completely" only to have it fail, and in an unexpected way, early in its operational life? What was your reaction?