I am hoping somebody has some suggestions with regard to the question posed in the title of this blog. Let me illustrate the problem...
I once designed a controller to heat a tank of liquid. Depending on the size of the tank, the customer would configure the system to use from one to six natural gas burners. In other words, there were systems with a maximum of two burners, some with four, and so on.
There were also mixers in the tank, where the number of mixers corresponded to the number of burners. There was an analog input that determined the system "demand" and -- based on that demand -- a number of burners (between one and the maximum) were ignited and the mixers (tied to a single control) activated at a particular RPM. As the demand increased, so the number of burners increased. Starting at one burner, the mixer RPM increased until a maximum was reached, at which point the RPM would then drop and the number of burners would be incremented. As the demand decreased, so the reverse happened (with hysteresis, of course).
To add complexity to this sequence, igniting a burner was achieved with an external controller for safety certification reasons. Ignition could take up to 15 seconds; if it didn't succeed, a further two attempts would be made before the burner was deemed "inactive." If a burner was extinguished while operational, an attempt would be made to restart it. If a burner failed to ignite, the controller would note this, exclude it from further operations, and initiate the next burner in the sequence.
When attempting ignition, all the mixers had to be reduced to a minimum rotation. However, the mixers had to be run at maximum RPM for one minute before attempting to ignite the first burner and for one minute after all of burners had been turned off. The system also controlled the level of liquid in the tank, and that level would be used enable or disable the heating process.
The system had two different LCD character displays, one of which supported four languages. It was possible to display metrics on these displays and use them to modify dozens of parameters.
I cannot imagine how many different operational combinations there were, and I have described only a ~40% subset of the actual requirements.
As I developed the program, I debugged and proved it. It was a mammoth task. Since I knew the software so well, it was possible to extrapolate certain tests and to economize on some time; however, I had to test where there was any doubt at all. Once I'd completed everything, the customer went through acceptance tests. Realistically though, how do you try every conceivable set up and failure condition? Of course, the first time a system like this is used, everyone involved will be especially careful, but what happens further down the road?
Many months after I'd completed the original system, the customer came back with a modification request. A Modbus serial link was added to the system, not only to report the metrics remotely, but also allowing the parameters to be adjusted as well as replacing the analog demand with the demand determined over the serial link. This was a big change, but theoretically even small changes should be studied seriously for their impact.
How can you validate the changes you make economically? How can the customer be sure that a bug has not been introduced through some forgotten linkage? I once performed a calculation showing that testing every combination with every time delay would take 10 years of continuous testing!
Of course, there are defensive techniques like modular programming and test-driven development to aid the development process and prevent errors, but how can you actually prove that your design is error-free? I am not even talking about safety certification -- just being able to ship a bug-free product.
I have come across organizations at events like Design West (now EE Live!) that claim they can design a suite of tests, but aside from the cost (and surely they aren't cheap) they require a set of accurate specifications. If I am lucky, I start out with a requirements document, which subsequently remains static while ongoing changes and clarifications are recorded by means of numerous emails and notes.
How do you address these issues? Any advice will be very gratefully received.