Breaking News
News & Analysis

Toyota Case: Single Bit Flip That Killed

10/25/2013 03:35 PM EDT
104 comments
Page 1 / 3 Next >
More Related Links
View Comments: Oldest First | Newest First | Threaded View
Page 1 / 11   >   >>
DrFPGA
User Rank
Blogger
Single bit flip
DrFPGA   10/25/2013 4:13:48 PM
NO RATINGS
I hope the analysis did some comparisons to accepted standards for safety. Which standards were followed?

Frank Eory
User Rank
CEO
Re: Single bit flip
Frank Eory   10/25/2013 4:36:43 PM
NO RATINGS
Although the quote about the danger of a "single bit flip" seems to have been in the context of software bugs -- it's hard to tell just from the quotes in this interview -- Barr also mentions single event upset. Memory bit errors (so-called "soft error rate") are a more of a hardware & system design issue, at least to the extent that the design includes mirroring, error detection and/or correction or other fail-safe measures.

At modern VLSI geometries, the soft error rate of an SRAM bit cell being bombarded with cosmic radiation at ground level is not as inconsequential as one might think -- especially for critical safety systems.

It makes one wonder how blame can be attributed to software in a system in which the source of the error may have been a random SRAM bit that was flipped by an alpha particle or other natural radiation event. Is the failure being blamed on software, or is it an overall laxity of hardware plus software that failed to prevent all of those 16 million possible ways a software task can die? How much fail-safing & hardware redundancy is enough to adequately protect against these events? In the end, it is a probabalitic issue, and the probability of failure will never be zero.

 

Bert22306
User Rank
CEO
Hard to tell what actually happened
Bert22306   10/25/2013 5:00:10 PM
NO RATINGS
It's certainly the case that tasks can die, and require a system reboot. That's why you have watchdog timers in control system software. In the description of the problem, it appears that several tasks died simultaneoiusly, although we don't know which tasks nor how simultaneous they were.

And it's also not clear whether individual task were monitored correctly, and whether it was the simultaneous nature of the failures that created a case where the reboots didn't occur.

Also, it looks like they found several potential mechanisms, not necessarily THE cause. One way to design around this sort of problem, although nothing will be 100 percent, is to have redundant processes do the same computations, and then compare the control signal at the output. If there's no match, you default to no acceleration.

The last safety measure is of course the driver. If unintended acceleration occurs, certaily in a 2005 car, put the car in neutral and shut off the engine!

junko.yoshida
User Rank
Blogger
Re: Single bit flip
junko.yoshida   10/25/2013 5:02:27 PM
NO RATINGS
I am not sure which "safety standard" you are referring to here. If you can clarify that, I could ask the expert. Thanks.

LarryM99
User Rank
CEO
Re: Single bit flip
LarryM99   10/25/2013 5:07:09 PM
NO RATINGS
I've worked around control software for nuclear devices, which obviously operate by a different set of rules than just about any other. One interesting safeguard is testing within the body of critical functions to ensure that the function was entered at the top, rather than as a random jump into the body of the code (potentially the kind of error that could result from cosmic rays). One of the guys on our team was former military, and he told us that they had running bets whether the missiles would actually fire, given a valid control sequence. None of them believed that it would fire by accident.

If you look at modern automotive control systems they are beginning to introduce redundant voting controls. This is an effective way of effectively eliminating this type of error, be it from hardware or software.

Caleb Kraft
User Rank
Blogger
Re: Hard to tell what actually happened
Caleb Kraft   10/25/2013 5:07:36 PM
yeah, throwing it in neutral sounds like the ultimate solution. however, in that instant of completely unexpected acceleration, much damage can be done before even the most vigilant person can respond.

Frank Eory
User Rank
CEO
Re: Single bit flip
Frank Eory   10/25/2013 5:10:07 PM
NO RATINGS
If I may expand on my above comment a little further:

"Memory corruption as little as one bit flip can cause a task to die. This can happen by hardware single-event upsets -- i.e., bit flip -- or via one of the many software bugs, such as buffer overflows and race conditions, we identified in the code."

So he mentions hardware SEU, but also software bugs like buffer overflows & race conditions, which makes me wonder the following:

Consider a hypothetical safety-critical system that many might consider very well-engineered. Suppose that the software in this system is so well done & well-tested that there are no buffer overflows, no race conditions, no possibility of software-induced memory corruption whatsoever. In this hypothetical near-perfect system, the only way for memory to get corrupted is by SEU, and then only if the SEU goes uncorrected or the fail-safe systems fail to guard against it.

Suppose further that the engineers carefully considered SEU, and included fairly powerful ECC to guard against it's ill effects. Perhaps they even considered how much higher the SEU rate might be in a high-altitude city during peak solar flare activity. Is that enough? As I mentioned above, we're still dealing with probabilities that can never be zero.

I am in no way trying to defend buggy software or buggy hardware, I'm just asking how far does one have to go, and will it ever be far enough?

Larry: I had already posted the above before I saw your reply.

"If you look at modern automotive control systems they are beginning to introduce redundant voting controls. This is an effective way of effectively eliminating this type of error, be it from hardware or software."

Redundanct voting controls, dual CPUs running the same code in lock step, and so on. But the key statement you made is that these are a way of "effectively eliminating this type of error" and I am asking how effective must "effectively" be, in quantitative terms?

junko.yoshida
User Rank
Blogger
standard OS?
junko.yoshida   10/25/2013 5:18:24 PM
NO RATINGS
Speaking of standards, though, the expert group did find that Toyota failed to comply"OSEK," an international standard API specifically designed for use in automotive software. Toyota's Ex-OSEK850 version was not certified as OSEK compliant, according to Barr.

SSDWEM
User Rank
Rookie
Re: Hard to tell what actually happened
SSDWEM   10/25/2013 5:19:02 PM
NO RATINGS
"If unintended acceleration occurs, certaily in a 2005 car, put the car in neutral and shut off the engine!"

One thing you need to be aware of - in most modern automobiles with an automatic transmission, shifting gears is really more of a "suggestion" than a command.

Said another way, there is a CPU in between the gear selector switches that are being opened and closed, and the transmission.  If the very CPU which is causing UA is responsible for monitoring those "gear shift suggestions"... oh dear!  So much for shifting into neutral. 

I drive a manual transmission because it's fun, but I'm starting to see the value in the ability to physically disconnect the transmission from the engine.

I don't think we'll see the mechanical connection from pedal to brakes go away any time soon, but I wonder how far away we are from "Steer by Wire"

P.S.  Same thing goes for many of the "push button start" vehicles - there is no key to rip out of the steering column.  Press and hold the ON/OFF switch for a few seconds while hurtling down the road at 130MPH like Rhonda Smith?  (Just find her 10 minute testimony on YouTube and tell me she's not credible!)

Bert22306
User Rank
CEO
Re: Hard to tell what actually happened
Bert22306   10/25/2013 5:19:19 PM
True, Caleb, "mere humans" can be taken by surpise and perform all sorts of erroneous responses.

But in this specific case, where we're talking about the throttle, it's not clear what was involved. For example, it does not appear difficult to compare the throttle command to the fuel intake with the accelarator pedal position, as a reasonableness check. Is it that such a check was not done, or that for some reason, it failed? Or was it associated with a cruise control malfunction?

Page 1 / 11   >   >>
August Cartoon Caption Winner!
August Cartoon Caption Winner!
"All the King's horses and all the KIng's men gave up on Humpty, so they handed the problem off to Engineering."
5 comments
Top Comments of the Week
Like Us on Facebook

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
EE Times on Twitter
EE Times Twitter Feed
Flash Poll
Radio
NEXT UPCOMING BROADCAST
How to Cope with a Burpy Comet
October 17, 2pm EDT Friday
EE Times Editorial Director Karen Field interviews Andrea Accomazzo, Flight Director for the Rosetta Spacecraft.