Our author asks: "but how can you actually prove that your design is error-free?"
The answer is it is difficult but possible. It requires the use of technologies people are generally not familiarized with, such as proof assistants -- prominent examples are Coq, Isabelle/HOL, and ACL2.
There exist, at this point, actual formally verified software artifacts of quite substantial complexity -- the seL4 microkernel, the CompCert C compiler, the Quark browser, and a number of others.
Sadly, very few people outside of academia are familiar with the existence of formal verification tools for software, and the documentation for such systems is (to say the least) non-transparent. The best tool at the moment, Coq, practically requires that its users learn quite a bit about Martin-Löf type theory in order to work effectively in it.
That said, such tools are clearly the way to deal with such things going forward.
It requires the use of technologies people are generally not familiarized with, such as proof assistants -- prominent examples are Coq, Isabelle/HOL, and ACL2.
FWIW I did a google search and came up with this book which looks like one place to start. As you say all the references look highly theoretical. Do you know of any practical courses on the subject (perhaps even at EE Live!).
I'd recommend Coq at this point, rather than Isabelle.
The path to understanding how to use the tool for purposes like this is not, unfortunately, easy. Good examples of projects that have been formally verified using Coq would include CompCert (publications: http://compcert.inria.fr/publi.html ) and Quark ( http://goto.ucsd.edu/quark/ ) -- looking at them may give you some hint of what is possible and how difficult it might be.
Benjamin Pierce's "Software Foundations" ( http://www.cis.upenn.edu/~bcpierce/sf/ ) is probably a good place to get some background while simultaneously learning a bit of Coq. The book Certified Programming with Dependent Types ( http://adam.chlipala.net/cpdt/ ) intends to teach one specifically how to use Coq to write verified programs, but I caution it is not easy going. There is also a book on Coq itself called CoqArt ( http://www.labri.fr/perso/casteran/CoqArt/index.html ) that is not bad.
If this all makes it sound like this is Not An Easy Thing To Learn, that's because it isn't. The state of the art has advanced dramatically in 20 years, from the point where contemplating proving something as big as a compiler correct was impossible to the point where it is feasible for very dedicated PhD students to work on such a thing. However, productizing the work of the innovators in this field is something that has not yet happened. Documentation is sketchy, tools are difficult to use, and the level of complexity remains very high.
That said, it is in fact now feasible for smart people to do this stuff, and the result of formally verified programs is true assurance of correctness.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan
Now, add a complex physical system interacting with the code under test and things become even worse... My first reaction is that you can "asymptotically" check the code of a system such as the one you describe, but the coverage will never be perfect -- no matter if you do it by hand or using an automated tool.
Brian Kernighan wrote: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
I don't agree with this statement. If your cleverness has to do with using obscure side-effects to save a few lines of code, then Dr. Kernighan is right. However, if you use your cleverness to create a simpler way to view the problem, or to structure your code so as to be much clearer, then the time you spend being clever will prevent errors that you would otherwise need to debug later. In this case debugging will be much easier than writing the code. It's much harder to debug something that was thrown together quickly than something that was carefully designed with debugging considered from the beginning.
My favorite quote on this subject is from C.A.R. Hoare:
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.
Languages like Haskell are very cool things, but they're not per se ways of producing formally verified code. To do that, you need to produce a statement of what your program is supposed to do in formal logic and then demonstrate that your program actually does it with something like a proof assistant -- not something Haskell or Spark ADA are designed for. That said, Haskell was used to produce the executable model against which the C code used in the seL4 project was validated.
"But it still doesn't prove the widget will do what you actually wanted it to do - only that the two models do the same thing under a (hopefully) more exhaustive set of conditions" -- well, sort of.
It is true that you have to be able to explain in logic what you want the system to do, and the need for such specifications has often been held out as a reason that formal methods can never work. However, I think that's just sour grapes from the days when formal methods were *really* impractical twenty years ago. Not that long ago we *couldn't* actually prove significant systems correct, and so we convinced ourselves that even if we could we would never be able to figure out what properties we wanted proven correct anyway -- just like the fox and the grapes. In practice, I think people often overestimate how troublesome it is to produce a logical statement of the desired behavior vs. implementation.
Consider as a trivial example a sorting routine. The output is supposed to be a permutation of the input in which no element is larger than the next (or smaller than the next depending on sort direction). This is very compact to express in logic -- but there are dozens and dozens of algorithms that will do it, and expressing most of them is substantially more complicated than the correctness condition.
That said, doing things like producing the formal semantics for C to permit the statement of the correctness condition for CompCert was certainly a huge effort, and I don't want to claim that it is not a significant effort in many cases to produce a good statement in formal logic. What I'm trying to claim is that this problem is not, in fact, insurmountable.
Indeed, producing statements about correctness constraints in a safety critical system is often quite straightforward. In particular, the control problem described by the original article has safety conditions that are likely quite compactly described.
Even if you can verify your program is a bug-free implementation of your specifications, how do you verify your specifications?
And how do you account for electrical or mechanical "bugs"? Such as sensors not working correctly, cosmic rays (SEUs, etc), mechanical elements failing, and such.
And then what do you do when the needs (specifications) change? Often times, it seems like people assume the specs will never change; one of the things I like best about agile development is that they assume the specs will change. Automated test suites can help (regression testing), but with real world systems that's harder.
There's a reason I like hard-wired (NO SOFTWARE!) E-Stop or EMO systems.
There's a reason I like hard-wired (NO SOFTWARE!) E-Stop or EMO systems.
The safety aspect of this project had external controls, so from a gas safety perspective it was subject to official regulation and actually had separate hardware to ours. That doesn't mean that we weren't call to account when the water tank overflowed and flooded a facility. That was a result of a water spill onto the PCB and so was not considered poor design (on our part).
I have used a painstaking approach before; used a microcontroller board to stimulate (or replace) the sensors in the system, and drive actuators or motors (etc). Then the test fixture and its code, simulate the real world to drive the system-under-test. The code running in the test fixture has the problem that there are too many permutations to manually code for, and the rules for 'real-world' must be entered. An expert system that is rule-based, can at least model the tanks so that it doesn't present out-of-bounds or invalid situations to the system under test. But it does let you see what happens when a sensor or ignitor (etc) fails. In my situation, after generating expected-behavior for the test-fixture, I used a monte-carlo generator to give test-vector coverage. On another project, I used a digital recording device that was connected to the system's sensors (plus a few I added) to watch and record a weeks worth of normal system operation. This was used to 'learn' what stimulus could be presented from the test fixture to the system under test, during the validation phase. Also, during validation, the 'learned' data was tweaked to present other 'what-ifs' to the system under test.
Then the test fixture and its code, simulate the real world to drive the system-under-test. The code running in the test fixture has the problem that there are too many permutations to manually code for, and the rules for 'real-world' must be entered.
Because we were developing we did not have real gas burners and we did create a test fixture that allowed us to cause faults and simulate real action on each individual burner controller. But as you point out- you don't know how the real world is going to interact. How will the failure of one impact on the others? You only really have the answr once the machine is built- all the rest is insight and guesswork (and hopefully carfeul coding).
On another project, I used a digital recording device that was connected to the system's sensors (plus a few I added) to watch and record a weeks worth of normal system operation
I really like this idea. I must remember it for my next project. Because of the elusive nature of some problems, I am not sure it shouldn't run 100% of the time, rather like a security camera.
The Rogues Gallery of software bugs that I've accumulated demonstrates that the current state of software testing is very poor. I believe that while the various "bottom up" testing protocols may help, it is also necessary to have a top down approach. Someone with a holistic view of the project needs to try to break it. Often it takes me less than 5 minutes. The essential approach is to incorporate known failure modes from past software projects as well as the boundary conditions of the process. What happens when the load is exceeded, too small, positive, negative, when the expected numerical input is alphabetical, when the expected numerical input is alphabetical, when power is lost, when power is restored, when power blinks. What happens when the container unexpectedly heats up from an external source? What happens when a thermostat fails? While these tests are not a "comprehensive" test protocol, they catch a lot of problems that seem to get missed by the formal test designs that operate with blinders on.
Someone with a holistic view of the project needs to try to break it. Often it takes me less than 5 minutes
I have the inverse Midas touch as well.
The essential approach is to incorporate known failure modes from past software projects as well as the boundary conditions of the process.
I agree with you that past experience is a great predictor of where problems are likely to occur. However the overall problem is making a change once the system has already been tested. It is rather like tickling a dog somewhere on his body and his leg starts jiggling. There are just so many possible combinations that there is an unforseen connection. And I hate handing over the revised software only to get a phonecall 3 months later (when I have forgotten everything that I did) that it doesn't work when condition A exists subject to limitation B and when it's raining outside.
Yes, I recall working on an advanced mail sorting machine nearly 30 years ago with a painfully slowly assembled prototype circuit board. The system was being tested with random "junk" mail and about every six hours there would be a bang, a jet of fire would come out of one IC on the circuit board, and the system would stop. After a few hours of work, the circuit was reassembled and the process repeated. We finally realized that the 99.9995th percentile thick mailpiece (1 in 200,000) was flexing the track, bending the circuit board, and causing the power rail to short circuit against the chassis. Hence the fireworks. With the explanation in hand, it was easy to remedy.
Good simulation of a system prior to a good formal qualification proceedure can help identify area's that need work. Often this effort is short-changed by being in a rush to get something to show the customer. Companies that can keep a new product a secret, long enough to do a good simulation, and prototype seem to have an advantage over those that do not. Formal tracing of each requirement, can aid in identifiying gaps in testing, as well as speed testing after a change is implemented by allowing one to help identify affected areas.
It is often good to check with experts in a given industry on what other effects doing something like a simple software change can have -- such as impacting the EMC tests