BOSTON System failures cause by poorly written software code are the result of "benign negligence," according to Jack Ganssle, keynoter at the Embedded Systems Conference here on Tuesday (Sept 16).
Ganssle blamed the failures on three phenomena: engineers build bad code; software is inherently problematic; and there is a fundamental disconnect between an engineer writing code and management's pressure to ship before it is ready.
Ganssle recited a litany of failures caused by lessons unlearned which have caused everthing from minor inconveniences to injury and death. He urged engineers to "program defensively."
Many failures cited by Ganssle were caused by faulty embedded software that brought down aircraft. In two space missions, Clementine and NEAR, errors learned in the first were repeated in the second. "In both cases, a glitch in a sequence of events caused all the fuel of the thrusters to be dumped all because of insufficient testing and the need to meet unreasonable schedules," said Ganssle.
In fact, NASA has mapped mission schedules against their complexity. The result has been more complex systems that have a tendency to fail under tighter schedules. "The recent launch of one of the most complex Mars exploration vehicles went without a glitch [but] I'm holding my breadth for its success", said Ganssle.
Ganssle cited a report on growing pacemaker recalls from 1990 to 2000. "The scary part is that for the last half of the decade, the results became worse. And some of these pacers have killed people."
Ganssle is a veteran embedded system designer with about 150 projects under his belt, including a White House security system and other classified government projects. "No system is perfect, but we as an industry need to concentrate on learning from our failures, sharing those failures with others and applying better solutions to the next project."
A level of acceptable risk must also be considered, Ganssle said. The August blackout was not a failure, in Ganssle's eyes. "When you have a mean-time-between failures (MTBF) of 25 years, that's pretty good. Stressed systems will fail, and you need to design for the expected MTBF. But the failures are not appreciated by the general public."
The Embedded Systems Conference runs through Thursday (Sept. 18). Conference organizers said attendance was up 20 percent over last year.