Software Bug Induces Extreme Hardware Test, Maybe
12/19/2016 04:32 PM EST
It's easy to make test and troubleshooting assumptions, and we may not have the time or tools to find out what the actual cause of the problem is.
I've had a mini-laptop PC for several years. Although this Hewlett-Packard Mini 110 (also called a "netbook") is a diminished line on the market's PC-genealogy tree, it served me well until about a year ago, when two problems came up a few months apart. The 2.6-pound/1.2-kg unit has since been replaced, but along the way, it taught me about solid subsystem design and life testing.
What actually happened? First, the battery pack died, so the unit had to be connected to the AC mains at all times. Because laptops are designed to run on their batteries, they have minimal bulk-storage capacitance in their power supply along with near-zero ability to ride through a power-line glitch. Thus, even a small AC transient (and they are surprisingly common), which would otherwise be unnoticed or a non-issue, resulted instead in shutdown—a frustrating situation. A replacement battery was about $100 online (and of questionable quality, as it was a no-name aftermarket unit); that was not a viable option.
Second, some sort of system bug in the Windows XP operating system popped up (at least, I believe that was the source), which caused the unit's hard-disk drive to keep looking for something that apparently didn't exist. I installed some clean-up and tracking diagnostic tools, but all I learned is that there was some sort of registry problem. I suspect—but cannot verify—that these problems were related to the ongoing disk access. In any case, there was nothing I could do about it.
The consequence of this presumed OS bug (feature?) was that the disk drive would keep looking and looking and looking and never take a break. I could clearly hear its non-stop disk accesses for hours and hours, with the disk-access light on the case on all the time, with an occasional blinking off. As a result, the laptop slowed to a crawl, taking several seconds to change screens or react to commands, and minutes to open a file. It seemed that the drive was fully occupied by its quest and using all the system resources to get there.
But this nearly non-stop, high duty-cycle access activity did tell me one thing: that internal disk drive must have had a solid mechanical, electrical, and even thermal design. My unscientific estimate is that over the past year, the disk drive ran through several lifetimes of normal use, and yet it kept on chugging along. Unintentionally, this presumed software bug allowed me to hear and see a drive in full action and develop major respect for the drive vendor (who I will identify after I open the netbook, just to see what's inside). It's as if I had been volunteered to do life tests for them.
Of course, my speculation about the drive's ruggedness could also be wishful thinking. Perhaps the problem isn't really a software bug that's causing the drive to look for something that isn't there. Instead, it may be that drive is misaligned or has a problem which is actually causing it to keep trying to read a track, but it can't do so successfully.
If there's one thing we know about troubleshooting, it's that it's very easy to make assumptions and then logically follow them to incorrect conclusions, especially when faced with only circumstantial or scant evidence as to what is actually happening. So while I assume that the registry errors are the root cause of the drive's hunting, that could just be a coincidence.
I try to look at these situations as learning experiences, and I'll know more when I do an "autopsy" teardown just to see how the unit is constructed internally. I'm especially interested in any heat sinking, heat pipes, heat spreaders, or other dissipating techniques used because the unit does run hot and the fan works hard. At the same time, I'll look at the disk drive to see what makes it special — if I can figure that out.
Have you ever had a frustrating problem in one area which at least helped you better understand the design or operation of another part of your design? Have you ever seen products that are overdesigned in one area to compensate for design weaknesses in others?
