Breaking News
News & Analysis

Toyota Case: Vehicle Testing Confirms Fatal Flaws

10/31/2013 06:15 PM EDT
68 comments
NO RATINGS
4 saves
Page 1 / 5 Next >
More Related Links
View Comments: Threaded | Newest First | Oldest First
junko.yoshida
User Rank
Blogger
Three-part series based on trial transcript
junko.yoshida   10/31/2013 6:42:30 PM
NO RATINGS
We certainly don't mean to be "All Toyota All the Time" news; but we wanted to make sure that our readers have the opportunity to see snippets of what went on in the court room of Bookout v Toyota in Oklahoma. We created an exclusive three-part series based on trial transcript. The story above is the last in the series. The two others include: 


krisi
User Rank
CEO
Re: Three-part series based on trial transcript
krisi   10/31/2013 6:46:34 PM
NO RATINGS
thank you Junko for very comprehensive coverage...this has been an eye opening story for me...Kris

junko.yoshida
User Rank
Blogger
Re: Three-part series based on trial transcript
junko.yoshida   10/31/2013 6:50:51 PM
NO RATINGS
Kris, I am glad you feel that way. This case, I think, has legs, since most consumers as of today still believe that Toyota case is an old story; it's finished with Toyota's recall of millions of vehicles. But another trial, just like this one (buildling the case on the software flaws), is about to start in Santa Ana, Calif. next week.

Again, this vehicle in the Oklahoma case, 2005 Camry, by the way, is NOT on Toyota's recall list.

jhoopy56
User Rank
Rookie
Re: Three-part series based on trial transcript
jhoopy56   11/4/2013 6:01:45 PM
NO RATINGS
This is frankly, nonsense.  As was demonstrated in the Audi debacle years ago, automobiles will come to a stop with the accelerator floored (throttle fully open) in approximately 20% longer distances than normal if the brakes are fully involved (ABS invoked).  Audi's president at the time soberly demonstrated this by planting both feet and observers recorded the brake performance.  This is because brake torque >> engine torque.  It is a classic mechanical safety override (SW be damned).  The only way you can get around this is either a) simultaneous failure of two major auto control systems, separately enabled (one electronic, the other largely hydraulic/mechanical), or b) if you have a system which can electronically disable the brakes (pathological reverse-ABS??).  The former is ridiculously unlikely and the latter, not demonstrated to be the case.

It is telling that, in the testimony, the vehicle was brought to a virtual stop on the dyno through brake actuation even with the simulated loss of task X.

Antony Anderson
User Rank
Rookie
Re: Three-part series based on trial transcript
Antony Anderson   11/4/2013 7:41:37 PM
NO RATINGS
From a functional safety point of view,  the most effective way of stopping a runaway vehicle is first to remove the source of energy causing the acceleration and then to apply the brakes.  It is inappropriate in my view to treat the driver exercising the brakes as the fail-safe for an engine that is out of control.This presumably is why Toyota now fit brake override software.

here are some of the factors that make it inadvisable to rely on the brakes as a fail-safe:
  • Brakes only have a limited capacity for absorbing heat. If the temperature of the brake cylinders rise too far the hydraulic fluid will boil and cause vapour locks which greatly reduce braking efficiency. The temperature at which the hydraulic fluid boils is dependent on the moisture content of the hydraulic fluid and drops as this rises. Hydraulic fluid readily absorbs moisture- hence the importance of changing it on a regular basis.
  • With a racing engine, there is no vacuum produced and hence if you pump the brakes you will rapidly lose vacuum brake assist
  • with a racing engine there may well be sufficient slip in the torque converter to give somewhere between a 2 and 2.5 times torque multiplication factor, which means that you have to press twice to two and a half times as hard to get the necessary braking force at the wheels.

I for one think that the three part series based on the trial transcript has provided an extremely useful and helpful insight into the evidence presented by Dr Barr to the jury. The the resultant discussion has been wide ranging, constructive and fruitful. I certainly have learnt a great deal. Many thanks Junko!

 

Bert22306
User Rank
CEO
Re: Three-part series based on trial transcript
Bert22306   11/4/2013 8:31:23 PM
NO RATINGS
Sure, it's better to have the engine throttled back in these emergemcies. But it's also true that the brakes can overpower the engine, even at full throttle.

I too was somewhat relieved to discover that the brakes did work throughout these instances of task X death (on the third article or so of the series - it was definitely not clear before that). The power from the engine is just not that huge of a concern, if you plant your foot firmly on the brakes, because, as the Audi tests showed in the mid 1980s, the stopping distances do not change by much, power on or power off. That means, there is not a big difference in the amount of energy the brakes need to dissipate as heat. It's more important to catch the problem before the car really speeds up.

It is true that the vacuum assist will go away if the engine is at full throttle, but that would only occur if task X died while brakes were being applied. Otherwise, if task X died while the brakes were NOT being applied, the throttle would shut down, and you'd have vacuum assist. And here's the really interesting part, even in the worst-case scenario (task X death while brakes are being applied), if the driver HAD pumped the brakes, as the Toyota is programmed, she would NOT have lost power assist! Because apparently, you need to release the brakes for a couple of tenths of a second and then reapply, in order for the throttle to be shut down, even in this worst case.

Now, if somehow that task X death had affected ABS in such a way that the brakes didn't work, the situation would have been a whole lot more dire. In the early reports, this "small" detail was never brought out.

junko.yoshida
User Rank
Blogger
Re: Three-part series based on trial transcript
junko.yoshida   11/5/2013 1:18:40 AM
NO RATINGS
@Bert, you wrote:

Now, if somehow that task X death had affected ABS in such a way that the brakes didn't work, the situation would have been a whole lot more dire. In the early reports, this "small" detail was never brought out.


Because we only have a redacted versin of the transcript in which exact functions of Task X were not disclosed, it's hard to tell. But I would like to call your attention to the following part of the court transcript: http://www.eetimes.com/document.asp?doc_id=1319936&page_number=4

This may give us some clues.

Here are Michael Barr's answers to the questions by the plaintiffs' lawyer:

Q Let me ask about that then. The jury heard testimony about a brake override system. Are you familiar with that?

A Yes.

Q Wherein the accelerator is in certain condition, if you press the brake it will automatically cut the throttle. Are you familiar with that?

A I am. There is not one in the 2005 Camry, to be clear.

Q Right. Do you have an understanding of the system that Toyota has since used?

A Yes. I reviewed the one that they put into the 2010 Camry.

Q Where is the function for that brake override? Where is the task located, as you understand it?

A Yes. So the brake override that is supposed to save the day when there is an unintended acceleration is in task X, of course, because it is the kitchen sink.

 

Bert22306
User Rank
CEO
Re: Three-part series based on trial transcript
Bert22306   11/5/2013 4:43:36 AM
NO RATINGS
That's what I was referring to, Junko. That testimony was misleading. The "brake override" he was referring to was only the feature where applying the brakes simultaneously cuts the throttle. The implication was that the brakes didn't work at all, which isn't the case. And the throttle override feature does work, except in cases where task X dies while the driver is braking. So it's not as bad as I thought.

Specifically, this quote here:

"Q Where is the function for that brake override? Where is the task located, as you understand it?

"A Yes. So the brake override that is supposed to save the day when there is an unintended acceleration is in task X, of course, because it is the kitchen sink."

Don't you get the impression from this that the brake override won't work when task X dies? And is it made clear that the brakes do work, even if the throttle isn't cut in worst-case scenarios? The brakes would STILL "save the day," if the driver can overcome his or her moment of astonishment.

My approach would have been to make the whole situation clearer from the start, especially in view of the fact that the attorney doing the questioning did not seem well versed in these matters.

junko.yoshida
User Rank
Blogger
Re: Three-part series based on trial transcript
junko.yoshida   11/5/2013 1:55:37 AM
NO RATINGS
@Antoney Anderson, you have been absolutely critical in our Toyota discussions on this EE Times Forum. Thank you so much for chiming in often, offering pointed guidances and bringing clarity to the issues.

Antony Anderson
User Rank
Rookie
Counter intuitive brake action necessary in a sudden acceleration incident?
Antony Anderson   10/31/2013 6:51:39 PM
NO RATINGS

"Q. So in other words, if you're driving down the road and you put your foot on the brake to slow down, for whatever reason, during that time period task-x is where it actually dies, the vehicle starts to accelerate.

You've got to actually back off the brake and try and catch it?

A. That's correct. Which is both counter intuitive because your car is zooming away and you have to let go of the brake. And it's also dangerous because as you let off the pressure of the brake, at least you were applying some mechanical pressure, but as you let off the car speeds up. And so that may increase the risk in the short term, at least, before this fail-safe would take effect."

This is absolutely amazing! Counter intuitive - I'll say so!

It is interesting to note however that many sudden accelerations seem to happen as the driver is pulling in gently to a parking space or pulling out of a parking space. Could it be that with very light braking the brake switch is giving a rather indeterminate signal to the ECU which is being misinterpreted? This needs teasing out more


krisi
User Rank
CEO
Re: Counter intuitive brake action necessary in a sudden acceleration incident?
krisi   10/31/2013 7:09:06 PM
NO RATINGS
This is used to be a driving technique inslipery conditions before ABS was implemented

Bert22306
User Rank
CEO
Re: Counter intuitive brake action necessary in a sudden acceleration incident?
Bert22306   10/31/2013 7:13:11 PM
NO RATINGS
Actually, on the contrary, this testimony sounds less damaging to me.

First, the brakes did work throughout task x death.

Second, the problem of power not being cut, when brakes were applied, only occurs if task x death occurs WHILE you are braking. Otherwise, it seems that braking did cut the power. Just that the driver neeeds to be awake enough to realize that speed is going up and up.

Third, it's not all that un-intuitive to pump the brakes if you feel they aren't doing the job. Just like you push again and again on the leveator button, if the eleveator doesn't come. This detail had already been explained, actually. But last time around, it was not clarified that death of task x only made the brake fail-safe incomplete if task x died while the brake pedal was pushed in.

Antony Anderson
User Rank
Rookie
Re: Counter intuitive brake action necessary in a sudden acceleration incident?
Antony Anderson   11/1/2013 8:20:24 AM
NO RATINGS
My understanding  of what Dr Barr is saying is something like this:

IF the driver already has their foot touching the brake

       AND IF a task death happens to occur

       AND IF the task death causes a UA

THEN the UA will continue for 30 seconds

      UNLESS the driver lifts their foot momentarily  off the brake pedal altogether,

             Thereby killing the UA and restoring control to the driver.



 

 

MS243
User Rank
Manager
Re: Counter intuitive brake action necessary in a sudden acceleration incident?
MS243   11/1/2013 11:18:44 AM
NO RATINGS
all ABS brake equiped cars owners manuals specifically state not to pump the brakes -- therein lies the problem -- The automation would not handlle a fault conditon correctly -- Interestingly TI, and Now Renesas both make MCU's that will automatically halt or reboot if there is a Bit Flip.  Bit Flips can happen due to Electrical Noise, Cosmic Radiation(what creates radio active Carbon14 in the air) and then there are hard failues like Flash Memory Charge bake-out, Electromigration, and thermal and mechanical cycling which in an automotive application can limit the life of parts to less than 10,000 hours.(One of my vehicles has over 4,000 hours on it already)

davidjohn_in235
User Rank
Rookie
Thanks for the article series
davidjohn_in235   11/1/2013 2:02:45 AM
NO RATINGS
Thanks Junko for the detailed coverage. The analysis described in the transcript is very useful & educative. As an engineer who has worked on non-critical automotive code, the article series gave a whole new understanding of the challenges and process required to test & qualify a critical automotive system

AZskibum
User Rank
CEO
Simulating EDAC failure?
AZskibum   11/1/2013 8:20:29 AM
NO RATINGS
"We can't just wait around for that particular bit to flip, which may take a long time."

The quoted testimony does not reveal much detail about the nature of the forced failure testing. Was this bit flip part of EDAC-protected memory, and if so, were multiple bits flipped? Specifically, was the test designed to overpower the error detection & correction capability of an error correcting code applied to particular variables in memory, or was this an example of memory locations that were unprotected?

If you are testing an error correcting code in an application, you know in advance the power of the code, as allocated to detection vs correction of bit errors, so you know in advance that the code can detect up to X errors in Y bits, and correct up to Z errors in the same Y bits, so part of your testing would be to confirm the behavior when you overwhelm the code with too many errors.

I'm not suggesting anything, just asking questions about how the testing was conducted, and how it relates to proof of what really happened on the road.

As development or verification engineers know, during the course of product development, design & verification teams can be somewhat adversarial in the sense that verification's job is to find ways to break the design, and there are always ways to break a design -- any design. The question then becomes whether the conditions that break the design are in scope or out of scope with respect to the requirements that must be met. If the only breakage is out of scope and the design meets all of its requirements, it likely moves forward to production.

It is also noteworthy that it appears that the brakes still functioned even under Task X death, if they were pumped. Whether pumping the brakes in an emergency could be deemed expected driver behavior might depend on whether one assumes the driver is a younger person who has no experience with non-ABS brakes, or an older driver who learned the pump the brakes technique in a pre-ABS era.

CharlieCL
User Rank
Rookie
Re: Simulating EDAC failure?
CharlieCL   11/1/2013 11:47:58 AM
NO RATINGS
If it is bit-flip that is not the software flaw. MCU should have error correction code (ECC) inside. Toyata's software may be not strong but it is not the reason. A bit-flip can crash any software.

MS243
User Rank
Manager
Re: Simulating EDAC failure?
MS243   11/1/2013 12:12:16 PM
NO RATINGS
There are safety criticla OS'es that will detect an error like a bit flip in the tasking -- through redundant code -- but they are not ported to every processor -- The adding of ECC to MCU's is only something that has come about after about 2009/2010 due to advances in part desnsity for a given price target -- prior to this it had to be handled via the OS(has existed as a technology since the 1990's)

SSDWEM
User Rank
Rookie
Re: Simulating EDAC failure?
SSDWEM   11/2/2013 2:21:18 PM
NO RATINGS
"There are safety criticla OS'es that will detect an error like a bit flip in the tasking" -

 

Can you share any details / links?  Would be interesting to see who's doing this, and how.

 

Thanks.

MS243
User Rank
Manager
Re: Simulating EDAC failure?
MS243   11/2/2013 4:54:52 PM
NO RATINGS
-- Safety and software

If one thinks about the software a bit both the OS and the OEM's code need to detect bit flips.

One way this can be done is via a checksum or a CRC.  A routine or object to wrie or read each data type in the OS+OEM code needs to be created that adds this element to the type as an element in a structure or similar.  If there is a checksum/crc error one must reboot or in some other manor re-test the entire memory to rule out a hard fault.


Another way might be to keep dupicate RAM  entries and re-boot / retest on the mis-compare of the  duplicates.

One also needs to do one of these on code both in RAM and FLASH.

Most of the major OS's such as VxWORKS, PSOS, Green Hills, etc should support something like this or better (possibly with an option)  

The FAA has several good papers on reviews of safery critical systems

Do a google search for SEU SOFTWARE FAA

Also look up Byzantine Generals Algorythm for Softtware.

See my profile for contact information for further advise

SSDWEM
User Rank
Rookie
Re: Simulating EDAC failure?
SSDWEM   11/2/2013 6:09:31 PM
NO RATINGS
Thanks for the thoughtful & detailed reply.

"Most of the major OS's such as VxWORKS, PSOS, Green Hills, etc should support something like this or better (possibly with an option) "


This is really the crux of what I was asking about.  Trying to see if there are any RTOS vendors who advertise fault-tolerant countermeasures such as mirroring critical RTOS variables & data structures.  I've haven't found one yet.  If I remember Michael Barr's testimony, some of the scheduler's task lists or whatever were right next to the stack, and some of the important application variables weren't mirrored.


Will be interesting to see if this type of functionality starts showing up in some of the more heavyweight RTOSes.  IMO it would be a reaction to this fiasco right here.





SSDWEM
User Rank
Rookie
Re: Simulating EDAC failure?
SSDWEM   11/2/2013 8:26:17 PM
NO RATINGS
Thanks for the thoughtful & detailed reply.

"Most of the major OS's such as VxWORKS, PSOS, Green Hills, etc should support something like this or better (possibly with an option) "


This is really the crux of what I was asking about.  Trying to see if there are any RTOS vendors who advertise fault-tolerant countermeasures such as mirroring critical RTOS variables & data structures.  I've haven't found one yet.  If I remember Michael Barr's testimony, some of the scheduler's task lists or whatever were right next to the stack, and some of the important application variables weren't mirrored.


Will be interesting to see if this type of functionality starts showing up in some of the more heavyweight RTOSes.  IMO it would be a reaction to this fiasco right here.





MS243
User Rank
Manager
Re: Simulating EDAC failure?
MS243   11/3/2013 9:44:13 PM
NO RATINGS
The FAA papers show some RTOS's that do some SW protection of Tasks, for others it is done as part of the Certification effort by the more reputable airframe, and equipment manufacturers.

 

For example Xilinx has some good whitepapers on SEU that detail some of the techniques for it's ARM based processors.

 

The spacemicro (www.spacemicro.com) offers IP/code for hardening un-hardened OS'es and using Non-edac CPU's to self check, and check vs a redundant channel.   These have been flown on space missions where bit-flips can happen quite often even on a small mcu.

 

I have myself written guidelines for hardening software and firmware in MCU's and FPGA's for companies -- see my profile for contact information.

ESUNDERMAN9874
User Rank
Rookie
Re: thank you!
ESUNDERMAN9874   11/1/2013 10:47:31 AM
NO RATINGS
I like it too

MeasurementBlues
User Rank
Blogger
If this is bad, just wait
MeasurementBlues   11/1/2013 11:48:44 AM
NO RATINGS

Computer-Controlled Anesthesia Could Be Safer for Patients

Computer-controlled sedation could lighten the load for intensive-care staff and make the process safer for patients.

Published in Technology Review.

Just what I want, buggy code replacing a doctor. But at least this medical devices go through rigorous testing for FDA approval. Cars?

 

JeffL_2
User Rank
CEO
Re: If this is bad, just wait
JeffL_2   11/1/2013 12:42:49 PM
NO RATINGS
Forget the FDA. I worked a contract for a company that was using computer software to control blood analysis and they hired me to work on the FDA application for an IND (investigational new drug, that's the only application for evaluation the agency has). As I recall there were about 50K lines of code written in a language that I did not at the time know well enough to evaluate nor were there ANY "tools" available to assist in the evaluation. Incredibly they actually asked ME to provide "signature authorization" that the code had been thoroughly tested and evaluated (there had only been some functional tests performed, nothing at all in the way of either unit testing or any level of structural coverage). I of course refused and was summarily let go, after which I went back to working on certification for avionics projects under FAA regulations because I already had worked in that field and knew that such incompetence and laxity just isn't tolerated in that world, and I never looked back. After reading this I don't get the impression that NHTSA is much different than FDA. As far as the latter goes, just don't get sick!

JIMAshby
User Rank
Rookie
Petrochemical standards in 1980
JIMAshby   11/1/2013 12:31:29 PM
NO RATINGS
I worked in the petrochemical industry back in the 1980's and we used redundant CPU modules with an independent hardware switch which would switch from a defective CPU module when a failure was detected, then sound an alarm.

The matrix of these control systems were interconnected where any single/multiple failure would be handed over to another control system so quickly the chemical processing equipment never even knew it was changed, but the alarms did.

I do not understand why in such a critical control system, a redundant MPU control module is not used.

It would not add that much weight to the vehicle to use a redundant control module system.

Let all the engineers in the world, join together and demand a common standard in the automotive industry to equal or exceed the medical and industrial chemical standards.

After all it may be your family whose lives are in danger!

Jim Ashby, AET

Antony Anderson
User Rank
Rookie
Re: Petrochemical standards in 1980
Antony Anderson   11/1/2013 2:56:08 PM
NO RATINGS
Jim Ashby writes:

"I do not understand why in such a critical control system, a redundant MPU control module is not used."

It seems that in Toyota were thinking along somewhat similar lines more than two decades ago! Clearly at that time someone was anticipating that there might be problems with electronic throttles.          

See US Patent 4,995,364


Abstract: "A throttle control apparatus for engines comprises two throttle actuators for driving two corresponding main and sub throttle valves mounted in series in an intake pipe of an engine. An observer, to which the modern control theory is applied, presumes an opening degree of the main throttle valve in a normal condition, which occurs a predetermined time later, from an accelerator depression amount, which represents a throttle opening command, and an opening degree ( angular position) of the main throttle valve. A failure detector quickly finds, from a deviation from the presumed opening degree of the main throttle valve, that  the main throttle valve has failed. When a failure occurs, the control of the sub throttle valve is started, making it possible to affect the throttle opening control with improved reliability."

 

JIMAshby
User Rank
Rookie
Re: Petrochemical standards in 1980
JIMAshby   11/1/2013 3:47:43 PM
NO RATINGS
Reading through the released court notes, it appears as they are only discussing a single point of control.

Being that the single point of control code is the target of the discussion, I would assume (and you know what that does to all involved) that they have only implemented a single point of control even though a dual point of mechanical control is in the process of control, as you have stated.

My comments are based on a failsafe system which does not rely on a single point of control, rather a duality of control with a monitoring unit, all being separate devices to insure a failsafe control system.

I have found in the past that just implementing fail safe code on a single MPU/CPU control unit such as a WDT or rolling codes, does not guarantee a failsafe system, but still creates a single point of failure as the court disclosures have show in the articles I read.

They only discuss function X as a single function which is responsible for all failsafe determinations, and only discuss a single MPU./CPU controller (unless I missed something).

I would never design a system such as this in which life or limb were in danger.

Even the system they designed was put through serious certifications and testing, and the error still exposed itself in real world applications.

I would NOT want any of these engineers designing a air/space ship of which I would travel on in the future.

I find it odd that the review engineers had to be sequestered to be able to review the code and determine the possible issues.

I also find that it is odd that they did not setup a know failing system and test the until a failure was seen to determine without a doubt, what the root cause IS, not assuming the failure by causing a most probable failure.

??????

 

 

Some Guy
User Rank
CEO
Re: Funky Code Access & Error Replication
Some Guy   11/1/2013 7:05:57 PM
NO RATINGS
As far as the funky arrangement they had to access the code, that seems like a pretty common practice when outsiders need access to see critical code (at least from a lawyer's perspective on IP / non-disclosure protection).

As far as the error replication, if you don't know the root cause you are only guessing at it, which makes replication a real bear. I'm not surprised that they were not able to reproduce it. The theory that they examined was that it was a Single-Bit-Error, which can have many causes. And without ECC, it was unmitigated. The system design just propagated the fault to a system failure which was an unsafe end state.

Of course, as any practitioner of Ford's 8D problem-solving can tell you, they really have 3 errors. In addition to the unmitigated single-bit-error, they also have a test / validation process that failed to find it. And, third, they have a design process that failed to prevent it happening in the first place. The lawsuit will really only address the first error; it's incumbent on Toyota to address the second two. (And in my experience, Japanese companies tend to go after all three as a matter of course.)

 

junko.yoshida
User Rank
Blogger
Re: Petrochemical standards in 1980
junko.yoshida   11/4/2013 11:55:36 PM
NO RATINGS
@JIMAshby, what the root cause is for a single bit flip is apparently hard to find.

As the expert witness Michael Barr noted, among dozens of tasks, there are16 million different ways those tasks can die. The experts group was able to demonstrate at least one way for the software to cause unintended acceleration, but there are so many other ways that could have happened.

You may not conisder it as a conclusive evidence. But in a trial like this, it raised enough reasonsable doubt to convince a jury to deliver a verdict against Toyota.

junko.yoshida
User Rank
Blogger
Re: Petrochemical standards in 1980
junko.yoshida   11/1/2013 6:43:10 PM
NO RATINGS
very interesting...

Wnderer
User Rank
CEO
Re: Petrochemical standards in 1980
Wnderer   11/1/2013 3:47:45 PM
NO RATINGS
I worked in medical and there was always a safety CPU or FPGA or safety analog circuitry. Basically they all worked the same. The input and output states were monitored and if there was some illegal combination, the device was put into a safe mode. I worked on safety analog circuits which were fairly simple measurement circuits and comparators with the advantage that analog circuits are conducive to single point failure analysis. It's hard to see how automotive gets away without any these safety methods.

junko.yoshida
User Rank
Blogger
Re: Petrochemical standards in 1980
junko.yoshida   11/1/2013 4:20:07 PM
NO RATINGS
@Winderer. Agreed. In the Toyota case, what I understood from Michael barr is:

Toyota's engineers sought to protect numerous variables against software- and hardware-caused corruptions (for example,  by "mirroring" their contents in a 2nd location), but they failed to mirror several key critical variables.

TonyTib
User Rank
CEO
Re: Petrochemical standards in 1980
TonyTib   11/1/2013 7:06:19 PM
NO RATINGS
Some data points from an industrial viewpoint:

I'm not a safety expert, but I've had to deal with some safety issues, especially SEMI S2.

The SEMI S2 safety standard requires an EMO button that turns off all power, except that required for safety and logging systems.  The EMO circuit has to be entirely electrical: NO SOFTWARE!  Even Safety PLC's don't qualify.  There's a lot to like about that approach.

On the other hand, my impression (I could be wrong here) is that the newer European safety standards are going away from this approach (for example allowing STO - safe torque off - and networked safety), and are allowing software into the loop, as long as it meets the appropriate SIL level standards, which are development process oriented.  I'm a process skeptic.


To give an idea of how industrial safety can be done, the Banner Micro-Screen light curtains used dual MCUs with different architectures and software ("diverse redundant").  When you're using a light curtain to guard something like a hydraulic press that can crush somebody, this type of approach is crucial.

Some Guy
User Rank
CEO
You left out the most important error - no independent failsafe
Some Guy   11/1/2013 1:48:25 PM
NO RATINGS
1st rule of failsafe is an independent off switch. If the cars were only wired with a on-off switch that actually turned the car off, instead of an input to the processor, that would have also been an option in case of this failure. I'll never buy a car that doesn't have a switch that is electrically in series with the rest of the system.

Here's a suggested poll:

Are you afraid of computers?

[ ] YES

[ ] NOT YET

 

SPLatMan
User Rank
Manager
Re: You left out the most important error - no independent failsafe
SPLatMan   11/1/2013 8:24:42 PM
NO RATINGS
@Some Guy opined:

"I'll never buy a car that doesn't have a switch that is electrically in series with the rest of the system."

I heartily agree with your sentiment, but I fear you will be unable to buy a new car 3 years from now.

 

Some Guy
User Rank
CEO
Re: You left out the most important error - no independent failsafe
Some Guy   11/1/2013 8:38:36 PM
NO RATINGS
I've got a pair of diagonal cutters. Won't stop me from adding an EMO to the circuit.

Just sayin'

Peter.Ting
User Rank
Rookie
Simple solution
Peter.Ting   11/1/2013 4:08:54 PM
NO RATINGS
Remove the steering wheel lock when the engine is shut off.  And make sure

the key switch is not just another input to the MPU.

SPLatMan
User Rank
Manager
Re: Simple solution
SPLatMan   11/1/2013 8:27:02 PM
NO RATINGS
Cars don't have key switches any more. They have keyless entry keys that cost hundreds of dollars to replace, and stop/start push buttons on the dash.

C Davis
User Rank
Rookie
Metastability?
C Davis   11/1/2013 6:10:35 PM
NO RATINGS
One thing that Toyota should also do is check their Hardware Vendor's design for Metastability.  This could be the actual root cause of the bad input/bit flip.   With so many car's on the road I would guess they would have a certain chance of this happening.  This risk can be modeled very accurately. The predict circuit behavior across all variations of process parameters, supply voltages, operating temperatures and the increasingly important effects of circuit aging is know.  I think Blendics has the best one I've seen http://www.blendics.com/index.php/blendics-products/metaace  .

Some of the bigger semiconductor companies have ad hock program, but nothing like this.

Some nice write-ups:

http://www.semiwiki.com/forum/content/2454-metastability-fatal-system-errors.html

http://www.semiwiki.com/forum/content/2494-your-synchronizer-doing-its-job-part-1.html

http://www.semiwiki.com/forum/content/2516-your-synchronizer-doing-its-job-part-2.html

http://www.semiwiki.com/forum/content/2620-metastability-starts-standard-cells.html

http://www.semiwiki.com/forum/content/2703-ten-ways-your-synchronizer-mtbf-may-wrong.html

I think that some of the cost likely should be born by the HW companies, as I've rarely seen too much attention paid to this.

It would be good to interview Jerry Cox the CEO of Blendix.  He is a senior professor at WUSTL and also cofounded Growth Networks which was acquired by Cisco.  I would guess he is one of the top asynchronous experts in the world.

DrQuine
User Rank
CEO
A Radical Alternative?
DrQuine   11/1/2013 9:36:00 PM
NO RATINGS
I do a lot of business travel and experience many car model in my car rentals every three weeks. I've noticed that the pedal placement varies from model to model.  Has anyone investigated the positioning of the brake and gas pedals with respect to the centerline of the driver's seat? It seems to me that some cars have the gas pedal where other cars would have placed the brake pedal (everything is shifted a little to the left). This could be a contributing factor - especially if the drivers in the accidents often drove other cars with the pedals in a different relative geometry to the driver's seat.

Bert22306
User Rank
CEO
Re: A Radical Alternative?
Bert22306   11/2/2013 6:27:31 PM
NO RATINGS
"It seems to me that some cars have the gas pedal where other cars would have placed the brake pedal (everything is shifted a little to the left). This could be a contributing factor - especially if the drivers in the accidents often drove other cars with the pedals in a different relative geometry to the driver's seat."

It was many years ago now, mid 1980s, that the car company on the hot seat, for this same unintended acceleration problem, was Audi. That was one conclusion reached, back then. Pedal placement. The driver can swear up and down that he had his foot planted on the brake, when in fact he had the accelerator floored.

Here's a more general discussion of the Audi and other examples of this phenomenon, unrelated to electronic controls.

http://en.wikipedia.org/wiki/Sudden_unintended_acceleration

DrQuine
User Rank
CEO
Vehicle Testing Testimony
DrQuine   11/1/2013 9:47:00 PM
NO RATINGS
The testimony cited here is quite remarkable. It is one think to speculate about the possibility that a corrupted bit might have serious consequences ... it is quite another to demonstrate in an operating car such a bit error does indeed have significant effects. Kudos to Junko for tracking down and publishing this very interesting evidence.

junko.yoshida
User Rank
Blogger
Re: Vehicle Testing Testimony
junko.yoshida   11/2/2013 6:49:47 AM
NO RATINGS
Thank you, DrQuine. The testimony of this court transcript has been truly educational and enlightening to me. But even better is some of he comments I read in this forum. I learn something new every day here. Seriously.

njamss
User Rank
Rookie
DOT specifications for critical SW?
njamss   11/2/2013 4:07:07 AM
NO RATINGS
Seems here that the DOT (and regulators in other countries) are at fault here.  They should have proper code/architecture guidelines!

When SW was introduced in flight control a few decades back, they had quadruplex channels to prevent such failures.  See for example "safety and redundancy"in http://en.wikipedia.org/wiki/Fly-by-wire

I have not been involved with this for a few decades, so they may be doing different things today, but cars should also adhere to similar standards required by the DOD/FAA.

 

junko.yoshida
User Rank
Blogger
Re: DOT specifications for critical SW?
junko.yoshida   11/2/2013 6:39:29 AM
NO RATINGS
@njamass, the more I look into this, the morbe convinced I am that NHTHA (http://www.nhtsa.gov/) is at fault here. They dropped the ball.

Wobbly
User Rank
CEO
software and hardware stability
Wobbly   11/4/2013 8:58:24 AM
NO RATINGS
Memory with ECC correction at the controller can mitigate electrically noisy environments.

ARM's AXI busses support client xPUs (APU/MPU/RPU) to provide task level access control to address space based on  virtual machine IDs, even in multi-core SOCs.

Properly configured, even threaded task OSes without full MMU support can have some level of memory protection between threads, and in multi-core solutions, individual cores can be corraled into private sandboxes.

These two techniques have been around for years, they are not new.

JeffL_2
User Rank
CEO
Re: software and hardware stability
JeffL_2   11/4/2013 12:02:56 PM
NO RATINGS
You mention "multithread" in the context of safety-critical code. That's kind of a stretch given that there are only a small number of languages for which it is even possible to write an "informal" tool to determine whether a particular thread or build is threadsafe, let alone one that can demonstrate this in a "formal" manner (show as a matter of mathematical proof that it WILL NOT miss any thread problems) so that a safety agency could allow its use. And those languages themselves generally either aren't suitable for safety-critical applications or very few people write in them in the first place. The truly safety-critical sections are required to run in a totally deterministic manner therefore even object-oriented languages generally aren't even currently tolerated for Level A of DO-178C (the known exception being Ada and I haven't participated in one of those projects yet, so I'm not sure exactly what you are and aren't allowed to do). Some of the IEC safety coding standards are so stringent that even the "routine" use of interrupt service routines is prohibited, try doing precise timing or comms without that! So there's not only a heck of a lot of work that needs to be done on the fundamentals, there's also too many people  without sufficient knowledge of how restrictive the current rules are or how VERY far we need to go before some of their "assumptions" come even CLOSE to reality. I believe it would be a "good first step" if the heads of the various groups who write these safety specifications could get together and publish some references of how all these languages, tools and requirements mesh and that would send the message to the academic world what areas of research need to be highlighted. Please note I don't want to "cast aspersions" on those who get it wrong or simply aren't aware what they are saying, it's hard enough for those of us who spend a good portion of our lives trying to keep current at this, and there's also quite a few "commercial claims" I see being made that need to be taken with a grain of salt because particular products or tools might theoretically have a certain advantage but they still haven't been approved for use because their claims have yet to be proven.

krisi
User Rank
CEO
Re: software and hardware stability
krisi   11/4/2013 12:14:55 PM
NO RATINGS
Fascinating case and interesting lessons in product development and potential liability...it brings in my mind a question of how much design, verification and validation effort is required and sufficient? ...seems Toyota didn't do enough system testing...but when do you to stop? is 99.99% certainity is sufficient, or you need 99.999% or better? how do you determine that point? Kris

MS243
User Rank
Manager
Re: software and hardware stability
MS243   11/4/2013 1:12:25 PM
NO RATINGS
Another very real issue that can haunt projects is the use of two fine a PCB via's for a given environment -- This can lead to via breakage due to shock, vibration, and temperature. (personally went through a plant closing and 1200 mile move due to vias breaking on another projects PCB's)  Even if only redundant ground vias break the ground bounce can grow and when combined with humidity results can even be more significant)  Ground bounce can cause logic corruption in MCU's, DSP's, CPU's and FPGA's.   There has to be enough built in self test of the hardware via software and safeguards to detect this issue. 

junko.yoshida
User Rank
Blogger
Re: software and hardware stability
junko.yoshida   11/4/2013 11:42:39 PM
NO RATINGS
@MS243, you wrote:

Ground bounce can cause logic corruption in MCU's, DSP's, CPU's and FPGA's.


You are absolutely right about this, hence, the expert witness was talking that corruptions could happen "on certain road conditions on certain days." That makes it imperative to have a built-in selft test of the hardwrae by software, as you point out.

Bert22306
User Rank
CEO
Re: software and hardware stability
Bert22306   11/4/2013 4:34:38 PM
NO RATINGS
"is 99.99% certainity is sufficient, or you need 99.999% or better? how do you determine that point?"

Mean time between failures. That's the only way I know of to put those strings of nines to good use. Like Frank said in another post, in some cases, you can quickly reach the known age of the universe. At that point, surely, you've done a good job.

(Of course, these numbers are only as good as the guy who worked them out.)

krisi
User Rank
CEO
Re: software and hardware stability
krisi   11/4/2013 4:43:40 PM
NO RATINGS
thank you Bert, make sense...but how do you calculate mean time between failures on a complex software-hardware system? I think these calculations refer to component wearout and reliability, this is fairly standard in a componet electronics industry...but they don't really take into account complex interactions between sofwtare and hardware, unexpected behaviour under signal interference, noise, etc...Kris

Bert22306
User Rank
CEO
Re: software and hardware stability
Bert22306   11/4/2013 4:51:30 PM
NO RATINGS
heh. That's why I think the numbers are only as good as the guy who worked them out. Yes, for sure, you have to consider the availabilties of the different subsystems in the calculation, and you also have to consider those functions that are safety critical, as opposed to the functions that are not.

This is a whole science unto itself, as you might imagine. Books have been written on this subject.

http://www.eventhelix.com/realtimemantra/faulthandling/system_reliability_availability.htm

Wobbly
User Rank
CEO
Re: software and hardware stability
Wobbly   11/5/2013 7:46:09 AM
NO RATINGS
You immediately addresses the topic of threads in safety critical products, but you did not address the two points that I raised, and those were ECC on memory and hardware memory region protection on client devices.

In thirty years of delivering core network equipment in telecom, including sixteen years within a Network Systems associated division of Bell Labs, I have had to deal with high reliability requirements. Not safety critical systems such as aerospace or medical, but still equipment that was intended to run unattended in locked vaults buried under ground in very remote locations, and perform its own diagnostics and fault reporting and mitigation. So I am not completety out of touch on those issues.

Even in single threaded systems with well defined task definitions, you can gain stability and safety through having well defined hardware access control limiting tasks to only those devices and memory regions that are associated with that particular task.

Having shipped systems that were expected to operate non-stop with five-nines of uptime in deployment, I have had the opportunity to observe things, such as the fact that any significant amount of RAM is going to show correctable single bit errors through a year of continuous operation, so bit flips do happen. Around mid 2005, we had appoximately six hundred router blades in a single distributed network, each blade had 1GB of DDR2 RAM, and over a year of collecting fault data, each blade expeienced about six or seven correctable ECC events. We were able to swap out two or three blades before they failed by having ECC event thresholds that flagged the cards for replacement.

Now admittedly, these where blades with 1GB of memory, running 24x7. But if you count the total installed RAM in all the cars on the highway, times the total run hours, there have to be distributed single bit error events occuring.

So do they use ECC? Or do they not?

junko.yoshida
User Rank
Blogger
Re: software and hardware stability
junko.yoshida   11/5/2013 8:31:43 AM
NO RATINGS
@Wobbly, according to the expert witness, "Toyota claimed the 2005 Camry's main CPU had error detecting and correcting (EDAC) RAM. It didn't." As you accurately pointed out, the expert witness also agrees that EDAC, or at least parity RAM, is relatively easy and low-cost insurance for safety-critical systems.

Wobbly
User Rank
CEO
Re: software and hardware stability
Wobbly   11/5/2013 9:06:01 AM
NO RATINGS
@Junko, Thank you for that response. I am surprised that they did not employ ECC given the spectacularly noisy electrical environment that is present in a typical automobile. Ignition noise itself has always been a problem in cars, but even current diesel engines, with their Direct Injection systems, are electrically noisy beasts.

Is anyone using hardware controlled access to device space or memory? This is fairly common in cellular handsets, both for security and runtime stablility. It also makes errors readily observable since out of bounds accesses drive immediate hardware faults instead of leaky data errors that may or may not be observed in testing.

junko.yoshida
User Rank
Blogger
Re: software and hardware stability
junko.yoshida   11/5/2013 10:27:04 AM
NO RATINGS
@Wobbly, exactly.



I think a lot of people are surprised, too. Although the lack of EDAC in Toyota's memory devices used at that time is not the ONLY reason that led to the bit flip, it is one important factor. 

msamek275
User Rank
Rookie
Use software assertions and leave them in the product!
msamek275   11/6/2013 12:24:14 PM
NO RATINGS
I am really surprised that nobody so far mentioned the use of simple software assertions.

Most people point out that ECC or MPU were not used. But these layers of protection are really nothing else than hardware-assisted assertions. I mean, what do you do when your ECC detects a parity error or your MPU detects an unauthorized memory access? Well, you execute an exception handler, which puts your system in a fail-safe state (typically a reset).

This is exactly what simple software assertions do too, except that software assertions can easily catch subtle logic errors that no hardware can detect.

So here comes my main point. Too often I see software assertions **disabled** in the production code. Interestingly, this is done by the same people, who advocate the use of ECCs or MPUs. Isn't this a bit inconsistent? How many readers of this article ship products with assertions enabled?

Wobbly
User Rank
CEO
Re: Use software assertions and leave them in the product!
Wobbly   11/6/2013 1:31:55 PM
NO RATINGS
Well, there are two ECC possibilites.

1) Correctable error, which is completely allowable. That is why you use ECC, though ECC events should be tracked and thresholded. On a car, for example, ECC events that cross a threshold could trip the ECU lamp for a service error. Note, the threshold would not be a total count, but a count per unit runtime. You need to filter them over time.

2) Uncorrectable errors. On typical ECC controllers, this throws a hardware exception.

The ECC implementations that I have dealt with actually drove a bus fault on the read cycle, they could not be byassed once they were enabled.

On client side MPUs, those also throw hardware exceptions. That is why I specifically asked about client side MPUs, as apposed to traditional MMU protection at the host side.

Assertions in code are one thing, but MPUs that actually throw back a physical bus fault into the core, that is another thing all together.

As far as software assertions, we never turn them off. They are in the production code.

msamek275
User Rank
Rookie
Re: Use software assertions and leave them in the product!
msamek275   11/6/2013 5:26:52 PM
NO RATINGS
@Wobbly: I still fail to see why an MPU-detected failure is "another thing all together" than a failing software assertion. For example, an assertion might check for an array index out of bounds. Why is such a failure so fundamentally different than an attempt to de-reference a NULL pointer, which might trip the MPU?

Wobbly
User Rank
CEO
Re: Use software assertions and leave them in the product!
Wobbly   11/7/2013 7:57:40 AM
NO RATINGS
Client side MPUs actually prevent resource access, read, write, or both, on chip select or address or even register level granularity. The access permission is granted based on VMID characteristcs that are driven as part of the bus cycle. The VMID characteristics are steered at the bus master by various attributes of the access, including (possibly) Task ID running on the core.

If a carved out RAM region, or a set of device registers, are reserved for a particular VMID, that is associated with a  particular TASK, then other tasks are prevented from accessing those resources even if the processor would otherwise be taking a legitimate action.

It prevents against software defects, it prevents against directed attacks on the system.

It is particularly useful in multicore systems with shared resources.

This is very different from the behavior of the CPU tied MMU.

In one of our current SOCS, which contains eight 32bit CPUs and eigth 32bit DSPs, there are roughly sixty client port MPUs to provide protection domains to the individual device and memory space that is shared between all sixteen cores.

So even if your assert becomes corrupted because of a bit flip or some other data failure that occurs outside the domain of the assert, the end point will block the access.

Each of these capabilities form a layered protection scheme. MPUs alone are not sufficient, MMUs alone are not sufficient, ASSERTS alone, are not sufficient, ECC alone is not sufficient. Together they provide a layered protection that provide defense in depth.

msamek275
User Rank
Rookie
Re: Use software assertions and leave them in the product!
msamek275   11/7/2013 12:04:19 PM
NO RATINGS
Naturally, software assertions use a different mechanism than MPUs, ECCs, WDTs, and other such hardware. But, still I think it is very beneficial to view all these mechanisms as complementary aspects of the **same** basic method.

This basic method is to intentionally introduce redundancy checks (either software-based or hardware-based) to ensure that the system operates as intended.

The problem with viewing software assertions as "another thing all together" than MPUs, ECCs, WDTs, etc. is that redundancy checks that are very easy to perform in software, but difficult in hardware, are not being done.

Too often this mindset leads to gaping security holes and sub-optimal designs. I believe that it is exactly what could have saved the day in the Toyota UA case. Please note that even if ECC was used, it would not detect memory corruption due to the alleged stack overflow or an array index out of bounds. Simple software assertions, on the other hand, would have easily detect such things.

So I repeat the main point of my original post. Software assertions are no less important than MPUs, ECCs, WDTs, etc. Unfortuantely, they are routinely under-utilized or disabled in the production code. I just hope that we could use the Toyota case to change this perception.

 

selinz
User Rank
CEO
brakes?
selinz   11/6/2013 1:28:50 PM
NO RATINGS
So I'm still not clear whether this task x would disable the brakes. Is that what they are saying?

Wobbly
User Rank
CEO
multiple stability/security checks
Wobbly   11/7/2013 1:54:37 PM
NO RATINGS
If you go back to my original post, we always use asserts on critical data on function entry and always use asserts on returned data, and those asserts stay in the delivered code.

Asserts are fine within a single task flow, but they do not protect adjacent tasks that can be corrupted by bad behavior between asserts. Hardware protection protects against cross infection, and ECC would have helped avoid the root cause (if the root cause was a bit flip).

It comes down to having layered defenses, both for stability, but also for intrusion and modification protection.

We haven't even discussed hardware assisted stack canaries or pseudo random cache line replacement.

Simon7382
User Rank
Freelancer
This is unbelievable
Simon7382   11/11/2013 6:05:39 AM
NO RATINGS
Running the break override routine on the same main processor as part of the "kitchensink" firmware is either incredibly irresponsible or shows total ignorance regarding the basics of real time software. Not even a rookie sw engineer would do this in the US. And this is the firmware of the best selling car in the US, probably one of the best selling cars of the world. It will be many many years before I would consider buying a Toyota, even though I had two of them in the past 30 years and was reasonably satisfied with both.

asta4vista
User Rank
Rookie
Redundancy and fault-proof design
asta4vista   11/16/2013 12:29:51 PM
NO RATINGS
Seems like Toyota engineers are not aware of fault-proof design basics. Well developed in 60-s and 70-s, redundancy and fault-proof reliability is standard in high fault cost areas like avionics or nuclear station control but is almost forgotten in gadget-oriented main stream electronics. Some comments below illustrate it even more: with forgotten general principles, companies and engineers create some home-brew and "common sense" based recipies

Flash Poll
Radio
LATEST ARCHIVED BROADCAST
Join our online Radio Show on Friday 11th July starting at 2:00pm Eastern, when EETimes editor of all things fun and interesting, Max Maxfield, and embedded systems expert, Jack Ganssle, will debate as to just what is, and is not, and embedded system.
Like Us on Facebook

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
EE Times on Twitter
EE Times Twitter Feed
Top Comments of the Week