Good simulation of a system prior to a good formal qualification proceedure can help identify area's that need work. Often this effort is short-changed by being in a rush to get something to show the customer. Companies that can keep a new product a secret, long enough to do a good simulation, and prototype seem to have an advantage over those that do not. Formal tracing of each requirement, can aid in identifiying gaps in testing, as well as speed testing after a change is implemented by allowing one to help identify affected areas.
It is often good to check with experts in a given industry on what other effects doing something like a simple software change can have -- such as impacting the EMC tests
Yes, I recall working on an advanced mail sorting machine nearly 30 years ago with a painfully slowly assembled prototype circuit board. The system was being tested with random "junk" mail and about every six hours there would be a bang, a jet of fire would come out of one IC on the circuit board, and the system would stop. After a few hours of work, the circuit was reassembled and the process repeated. We finally realized that the 99.9995th percentile thick mailpiece (1 in 200,000) was flexing the track, bending the circuit board, and causing the power rail to short circuit against the chassis. Hence the fireworks. With the explanation in hand, it was easy to remedy.
Brian Kernighan wrote: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
I don't agree with this statement. If your cleverness has to do with using obscure side-effects to save a few lines of code, then Dr. Kernighan is right. However, if you use your cleverness to create a simpler way to view the problem, or to structure your code so as to be much clearer, then the time you spend being clever will prevent errors that you would otherwise need to debug later. In this case debugging will be much easier than writing the code. It's much harder to debug something that was thrown together quickly than something that was carefully designed with debugging considered from the beginning.
My favorite quote on this subject is from C.A.R. Hoare:
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.
Someone with a holistic view of the project needs to try to break it. Often it takes me less than 5 minutes
I have the inverse Midas touch as well.
The essential approach is to incorporate known failure modes from past software projects as well as the boundary conditions of the process.
I agree with you that past experience is a great predictor of where problems are likely to occur. However the overall problem is making a change once the system has already been tested. It is rather like tickling a dog somewhere on his body and his leg starts jiggling. There are just so many possible combinations that there is an unforseen connection. And I hate handing over the revised software only to get a phonecall 3 months later (when I have forgotten everything that I did) that it doesn't work when condition A exists subject to limitation B and when it's raining outside.
The Rogues Gallery of software bugs that I've accumulated demonstrates that the current state of software testing is very poor. I believe that while the various "bottom up" testing protocols may help, it is also necessary to have a top down approach. Someone with a holistic view of the project needs to try to break it. Often it takes me less than 5 minutes. The essential approach is to incorporate known failure modes from past software projects as well as the boundary conditions of the process. What happens when the load is exceeded, too small, positive, negative, when the expected numerical input is alphabetical, when the expected numerical input is alphabetical, when power is lost, when power is restored, when power blinks. What happens when the container unexpectedly heats up from an external source? What happens when a thermostat fails? While these tests are not a "comprehensive" test protocol, they catch a lot of problems that seem to get missed by the formal test designs that operate with blinders on.
There's a reason I like hard-wired (NO SOFTWARE!) E-Stop or EMO systems.
The safety aspect of this project had external controls, so from a gas safety perspective it was subject to official regulation and actually had separate hardware to ours. That doesn't mean that we weren't call to account when the water tank overflowed and flooded a facility. That was a result of a water spill onto the PCB and so was not considered poor design (on our part).
Then the test fixture and its code, simulate the real world to drive the system-under-test. The code running in the test fixture has the problem that there are too many permutations to manually code for, and the rules for 'real-world' must be entered.
Because we were developing we did not have real gas burners and we did create a test fixture that allowed us to cause faults and simulate real action on each individual burner controller. But as you point out- you don't know how the real world is going to interact. How will the failure of one impact on the others? You only really have the answr once the machine is built- all the rest is insight and guesswork (and hopefully carfeul coding).
On another project, I used a digital recording device that was connected to the system's sensors (plus a few I added) to watch and record a weeks worth of normal system operation
I really like this idea. I must remember it for my next project. Because of the elusive nature of some problems, I am not sure it shouldn't run 100% of the time, rather like a security camera.
I have used a painstaking approach before; used a microcontroller board to stimulate (or replace) the sensors in the system, and drive actuators or motors (etc). Then the test fixture and its code, simulate the real world to drive the system-under-test. The code running in the test fixture has the problem that there are too many permutations to manually code for, and the rules for 'real-world' must be entered. An expert system that is rule-based, can at least model the tanks so that it doesn't present out-of-bounds or invalid situations to the system under test. But it does let you see what happens when a sensor or ignitor (etc) fails. In my situation, after generating expected-behavior for the test-fixture, I used a monte-carlo generator to give test-vector coverage. On another project, I used a digital recording device that was connected to the system's sensors (plus a few I added) to watch and record a weeks worth of normal system operation. This was used to 'learn' what stimulus could be presented from the test fixture to the system under test, during the validation phase. Also, during validation, the 'learned' data was tweaked to present other 'what-ifs' to the system under test.
NASA's Orion Flight Software Production Systems Manager Darrel G. Raines joins Planet Analog Editor Steve Taranovich and Embedded.com Editor Max Maxfield to talk about embedded flight software used in Orion Spacecraft, part of NASA's Mars mission. Live radio show and live chat. Get your questions ready.
Brought to you by