San Jose, Calif. The Therac 25 was supposed to save lives by zapping tumors with targeted blasts of radiation. Instead, the device delivered massive overdoses that killed three patients and injured several others because of software glitches by a lone programmer whose code was never properly inspected and tested.
The Therac 25 was just one of dozens of examples cited by speakers at last week's Embedded Systems Conference here to drive home a point: People's lives as well as millions of dollars in investments often depend on software engineering, but too many projects fail for lack of good programming discipline and management support.
And the problems may get worse as programmers face the additional challenges of handling multicore devices. Indeed, an annual survey of several thousand embedded engineers polled recently by EE Times and Embedded Systems Design magazine showed that the need for better software debug tools is a major concern, with test and debug taking up more time than any step in a project development.
"This is the only industry left where we can ship products with known defects and not get sued. How long do you think that will last?" asked Jack Ganssle, a consultant and author who presented a class on lessons learned from embedded-software disasters.
"We aren't afraid of software, but we need to be, because one wrong bit out of 100 million can cause people to die," said Ganssle, who said he has worked on more than 100 embedded projects, including the White House security system.
"As embedded systems grow in complexity, the software becomes an ever more important piece. Right now, 50 percent of our DSP spending is on the software side," said Gerald McGuire, general manager of the DSP group at Analog Devices Inc. (Norwood, Mass.), which employs more than 200 software engineers.
As software grows in importance, it is not necessarily becoming more reliable. According to one report, 80 percent of software projects fail because they are over budget, late, missing key features or a combination of factors. Another report suggests that large software systems of more than a million lines of code may have as many as 20,000 errors, 1,800 of them still unresolved after a year.
"We can't get rid of faults," said Lorenzo Fasanelli, a senior embedded-software specialist for Ericsson Labs in Italy. But engineers can speak up about faults, learn from them and rewrite code to proactively find and minimize them, he added.
"We cannot advance the state of the art without studying failure," said Kim Fowler, an author and systems architect who delivered an ESC talk called "Fantastic Failures."
There are plenty of failures from which to learn. Ganssle cited another radiation system that killed 28 people in a series of tests in Panama in May 2001 before the U.S. Food and Drug Administration shut down the company that made it. Inspections of software after the crash of a U.S. Army Chinook helicopter revealed 500 errors, including 50 critical ones, in just the first 17 percent of code tested.