Design Article
Mars ate my spacecraft!
Jack Ganssle
4/25/2011 4:32 PM EDT
April 28 was last day to register for ESC Silicon Valley. Registration is now closed. You may register on site at conference. Click here for more info. |
To gear heads like me the history of engineering is rich in stories and lore, of failings and successes, and of triumphs and defeats of individual engineers. I remember reading Michener's The Source in high school and being entranced by his description of how the engineers of Megiddo, near Jerusalem, dug a tunnel 210 feet long some 2,900 years ago. The city was under siege and its well was located outside the city walls. With uncanny skill, they bored under the walls, secretly, navigating with only the crudest of instruments, yet somehow targeting the narrow fount perfectly.
Even the humblest of artifacts of technology have fascinating stories. Friends still make fun of my reading Henry Petroski's 400 page book titled The Pencil. Yet even this simplest of all writing devices sports a complex and fascinating history, one of engineers and artisans optimizing materials and designs to give users an efficient writing instrument.
Then there's James Chiles' Inviting Disaster, a page-turner of engineering failures, from bridge collapses, airline crashes, offshore oil platform sinkings, to, horrifyingly, near nuke exchanges. Strangely Chiles doesn't describe the famous loss of the Tacoma Narrows Bridge, which succumbed to wind-induced torsional flutter. The bridge earned the nickname "Galloping Gertie" from its rolling, undulating behavior. Motorists crossing the 2,800-foot center span sometimes felt as though they were traveling on a giant roller coaster, watching the cars ahead disappear completely for a few moments as if they had been dropped into the trough of a large wave.

See Galloping Gertie video.
Failures can be successes. When an aircraft goes down the NTSB sends investigators to determine the cause of the accident. Changes are made to the plane's design, maintenance, or training procedures. This healthy feedback loop constantly improves the safety of air travel, to the point where now it's less dangerous to fly than walk. That's plenty strange when you consider the complexity of such a machine. 400,000 pounds of aluminum traveling at 600 knots 40,000 feet up, in air that's 60 below zero, with turbines rotating at 10,000 RPM. It's astonishing the thing works at all.
Yet the concept of applying feedback, lessons learned, is relatively new. Those behind the Tacoma Narrows Bridge certainly ignored all of the lessons of bridge-building.
Clark Eldridge, the State Highway Department's lead engineer for the project, developed the bridge's original design. But federal authorities footed 45% of the bill and required Washington State to hire an outside, and more prominent, consultant. Leon Moisseiff promised that his design would cut the bridge's estimated cost in half.
Similar structures built around the same time were expensive. At $59 million and $35 million respectively, the George Washington and Golden Gate bridges had a span similar to that of the Tacoma Narrows. Moisseiff's new design cost a bit over $6m, clearly a huge savings.
Except it fell down four months after opening day.
Moisseiff and others claimed that the wind-induced torsional flutter which led to the collapse was a new phenomenon, one never seen in civil engineering before. They seem to have forgotten the Dryburgh Abbey Bridge in Scotland which collapsed in 1818 for the same reason. Or the 1850 failure of the Basse-Chaine Bridge, a similar loss in 1854 of the Wheeling Suspension Bridge, and many others. All due to torsional flutter.
Then there was the 1939 Bronx-Whitestone Bridge, a sister design to Tacoma Narrows, which suffered the same problem but was stiffened by plate girders before a collapse.
And who designed the Bronx-Whitestone? Leon Moisseiff.
Lessons had been learned, but criminally forgotten. Today the legacy of the Tacoma Narrows failure lives on in regulations which require all federally-funded bridges to pass wind tunnel tests designed to detect torsional flutter.
In the firmware world we, too, have our share of disasters. Most were underreported, few developers understand the proximate causes and the lessons that need to be learned. The history of embedded failures shows patterns we should—must!—identify and eliminate.
Mars attacks!
Consider the Mars Polar Lander, a 1999 triple failure. The MPL's goal was to deliver a lander on Mars for half the cost of the cost of the spectacularly successful Pathfinder mission launched two years earlier. At $265 million Pathfinder itself was much cheaper than earlier planetary spacecraft.
Shortly before it began its descent, the spacecraft released twin Deep Space 2 probes which were supposed to impact the planet's surface at some 400 MPH and return sub-strata data.
MPL crashed catastrophically. Neither DS2 probe transmitted even a squeak.
Next: Page 2
Navigate to related information




cdhmanning
4/27/2011 10:14 PM EDT
The "test like you fly, fly like you test" mantra is problematic for many reasons. It is often too costly or impossible to achieve that. A Mars lander will only ever get a full workout when it gets to Mars and much of the code will never get tested (eg. that code that helps correct for a puff of wind that didn't come). Nobody will let you crash a car to test the airbag controller every time you tweak a few lines of code.
At best we can test using simulators, then hope like hell the simulators actually match real-world or real-Mars conditions. It is pretty easy to get something wrong (such as a negative sign meaning up instead of down).
Ideally we would have three or more independent groups develop the simulators to ensure that these problems get screened out. In the real world though we are constrained by cost and time to market.
Testing and verification already eats up the lion's share of many development efforts. Building large amounts of simulators and running a full simulated mission could easily double the cost of developing a Mars rover and we know that software testing already uses up most of the test budget in the design of a new car.
So what if the odd Mars rovers crashes? That's why we send robots on these dangerous missions. We can move swiftly and the cost of failure is reduced. No crying widows.
I'd even argue that a patient is better off with a pacemaker with a few quirks than no pacemaker at all.
Sign in to Reply
J-TX
5/5/2011 10:21 AM EDT
All very well and good until it's your pacemaker, or your cancer patient child getting fried by radiation.
Sign in to Reply
cdhmanning
5/8/2011 11:01 PM EDT
J-TX. Perhaps you can explain how having no pacemaker is better than having one with no quirks.
You miss my point.
If there was a 10% chance of me dying because of broken pacemaker software or a 99% chance of me dying with no pacemaker I'll take the pacemaker. It gives me better odds.
Even Therac - attributed with 6 overdoses in 2 years - was an overall success. It might have overdosed and killed 6 people (who would have died without it), but it saved thousands of lives during that time.
If the doctors said to me that my cancerous child had a 0.1% of being fried by Therac and a 90% chance of being cured by it, that's still better than not having the treatment at all.
My point was that things don't have to be perfect to have value - just be better than what they are replacing.
Sign in to Reply
DrQuine
4/28/2011 9:01 AM EDT
This paper demonstrates the importance of a multi-disciplinary approach to quality testing. While expected failure modes are easy to test for, they often have also incorporated into the design specifications. It is the "unexpected" failures that can be really catastrophic. This also illustrates the value of rechecking parameters using different approaches - it is much easier to read the same measurement twice than it is to measure two different ways and see if they are equivalent. Every engineer should be well versed in the history of engineering failures - so they won't have to repeat them.
Sign in to Reply
J-TX
5/5/2011 10:23 AM EDT
Are you saying that the biggest obstacle is EGO?
Sign in to Reply
jorgemendez
5/1/2011 12:09 AM EDT
Great article, great lesson
Sign in to Reply
Robotics Developer
5/2/2011 1:43 PM EDT
"Man who doesn't know history (or remember) is doomed to repeat it." A paraphrase of a quote I once heard a long time ago. History does repeat itself especially when people forget it, did not study it, or worse yet ignore it.
Sign in to Reply
willc2010
5/5/2011 4:55 AM EDT
It's always useful to be reminded of these issues, because they seem to keep happening. It's been clear for years and years that problems can be dramatically reduced by logical soundness in the design and coding, reasoning about the code, and reasonably realistic testing of the results. But the fashion for weak fundamentals, highly empirical code-debug cycles and 'tips and tricks' seems to be remarkably tenacious.
Sign in to Reply
Jan.Lindh
5/9/2011 5:57 AM EDT
How often does failures in the field come back to the engineer?
(Well the Mars missions I understand was probably feedback, but what about consumer products?)
Sign in to Reply
kaquino215
6/15/2011 1:14 PM EDT
Some companies I've worked for spend a lot of effort on ESD mitigation, an excessive amount in my opinion. I guessed about one million dollars a year was spent on ESD at one company. I argue that that quality of the product would be improved if instead of spending the money on ESD mitigation they hired six or seven full time software QA people and turned them loose on software testing.
Sign in to Reply