Breaking News
Comments
Newest First | Oldest First | Threaded View
<<   <   Page 3 / 7   >   >>
jhoopy56
User Rank
Rookie
Re: Three-part series based on trial transcript
jhoopy56   11/4/2013 6:01:45 PM
NO RATINGS
This is frankly, nonsense.  As was demonstrated in the Audi debacle years ago, automobiles will come to a stop with the accelerator floored (throttle fully open) in approximately 20% longer distances than normal if the brakes are fully involved (ABS invoked).  Audi's president at the time soberly demonstrated this by planting both feet and observers recorded the brake performance.  This is because brake torque >> engine torque.  It is a classic mechanical safety override (SW be damned).  The only way you can get around this is either a) simultaneous failure of two major auto control systems, separately enabled (one electronic, the other largely hydraulic/mechanical), or b) if you have a system which can electronically disable the brakes (pathological reverse-ABS??).  The former is ridiculously unlikely and the latter, not demonstrated to be the case.

It is telling that, in the testimony, the vehicle was brought to a virtual stop on the dyno through brake actuation even with the simulated loss of task X.

Bert22306
User Rank
CEO
Re: software and hardware stability
Bert22306   11/4/2013 4:51:30 PM
NO RATINGS
heh. That's why I think the numbers are only as good as the guy who worked them out. Yes, for sure, you have to consider the availabilties of the different subsystems in the calculation, and you also have to consider those functions that are safety critical, as opposed to the functions that are not.

This is a whole science unto itself, as you might imagine. Books have been written on this subject.

http://www.eventhelix.com/realtimemantra/faulthandling/system_reliability_availability.htm

krisi
User Rank
CEO
Re: software and hardware stability
krisi   11/4/2013 4:43:40 PM
NO RATINGS
thank you Bert, make sense...but how do you calculate mean time between failures on a complex software-hardware system? I think these calculations refer to component wearout and reliability, this is fairly standard in a componet electronics industry...but they don't really take into account complex interactions between sofwtare and hardware, unexpected behaviour under signal interference, noise, etc...Kris

Bert22306
User Rank
CEO
Re: software and hardware stability
Bert22306   11/4/2013 4:34:38 PM
NO RATINGS
"is 99.99% certainity is sufficient, or you need 99.999% or better? how do you determine that point?"

Mean time between failures. That's the only way I know of to put those strings of nines to good use. Like Frank said in another post, in some cases, you can quickly reach the known age of the universe. At that point, surely, you've done a good job.

(Of course, these numbers are only as good as the guy who worked them out.)

MS243
User Rank
Manager
Re: software and hardware stability
MS243   11/4/2013 1:12:25 PM
NO RATINGS
Another very real issue that can haunt projects is the use of two fine a PCB via's for a given environment -- This can lead to via breakage due to shock, vibration, and temperature. (personally went through a plant closing and 1200 mile move due to vias breaking on another projects PCB's)  Even if only redundant ground vias break the ground bounce can grow and when combined with humidity results can even be more significant)  Ground bounce can cause logic corruption in MCU's, DSP's, CPU's and FPGA's.   There has to be enough built in self test of the hardware via software and safeguards to detect this issue. 

krisi
User Rank
CEO
Re: software and hardware stability
krisi   11/4/2013 12:14:55 PM
NO RATINGS
Fascinating case and interesting lessons in product development and potential liability...it brings in my mind a question of how much design, verification and validation effort is required and sufficient? ...seems Toyota didn't do enough system testing...but when do you to stop? is 99.99% certainity is sufficient, or you need 99.999% or better? how do you determine that point? Kris

JeffL_2
User Rank
CEO
Re: software and hardware stability
JeffL_2   11/4/2013 12:02:56 PM
NO RATINGS
You mention "multithread" in the context of safety-critical code. That's kind of a stretch given that there are only a small number of languages for which it is even possible to write an "informal" tool to determine whether a particular thread or build is threadsafe, let alone one that can demonstrate this in a "formal" manner (show as a matter of mathematical proof that it WILL NOT miss any thread problems) so that a safety agency could allow its use. And those languages themselves generally either aren't suitable for safety-critical applications or very few people write in them in the first place. The truly safety-critical sections are required to run in a totally deterministic manner therefore even object-oriented languages generally aren't even currently tolerated for Level A of DO-178C (the known exception being Ada and I haven't participated in one of those projects yet, so I'm not sure exactly what you are and aren't allowed to do). Some of the IEC safety coding standards are so stringent that even the "routine" use of interrupt service routines is prohibited, try doing precise timing or comms without that! So there's not only a heck of a lot of work that needs to be done on the fundamentals, there's also too many people  without sufficient knowledge of how restrictive the current rules are or how VERY far we need to go before some of their "assumptions" come even CLOSE to reality. I believe it would be a "good first step" if the heads of the various groups who write these safety specifications could get together and publish some references of how all these languages, tools and requirements mesh and that would send the message to the academic world what areas of research need to be highlighted. Please note I don't want to "cast aspersions" on those who get it wrong or simply aren't aware what they are saying, it's hard enough for those of us who spend a good portion of our lives trying to keep current at this, and there's also quite a few "commercial claims" I see being made that need to be taken with a grain of salt because particular products or tools might theoretically have a certain advantage but they still haven't been approved for use because their claims have yet to be proven.

Wobbly
User Rank
CEO
software and hardware stability
Wobbly   11/4/2013 8:58:24 AM
NO RATINGS
Memory with ECC correction at the controller can mitigate electrically noisy environments.

ARM's AXI busses support client xPUs (APU/MPU/RPU) to provide task level access control to address space based on  virtual machine IDs, even in multi-core SOCs.

Properly configured, even threaded task OSes without full MMU support can have some level of memory protection between threads, and in multi-core solutions, individual cores can be corraled into private sandboxes.

These two techniques have been around for years, they are not new.

MS243
User Rank
Manager
Re: Simulating EDAC failure?
MS243   11/3/2013 9:44:13 PM
NO RATINGS
The FAA papers show some RTOS's that do some SW protection of Tasks, for others it is done as part of the Certification effort by the more reputable airframe, and equipment manufacturers.

 

For example Xilinx has some good whitepapers on SEU that detail some of the techniques for it's ARM based processors.

 

The spacemicro (www.spacemicro.com) offers IP/code for hardening un-hardened OS'es and using Non-edac CPU's to self check, and check vs a redundant channel.   These have been flown on space missions where bit-flips can happen quite often even on a small mcu.

 

I have myself written guidelines for hardening software and firmware in MCU's and FPGA's for companies -- see my profile for contact information.

SSDWEM
User Rank
Rookie
Re: Simulating EDAC failure?
SSDWEM   11/2/2013 8:26:17 PM
NO RATINGS
Thanks for the thoughtful & detailed reply.

"Most of the major OS's such as VxWORKS, PSOS, Green Hills, etc should support something like this or better (possibly with an option) "


This is really the crux of what I was asking about.  Trying to see if there are any RTOS vendors who advertise fault-tolerant countermeasures such as mirroring critical RTOS variables & data structures.  I've haven't found one yet.  If I remember Michael Barr's testimony, some of the scheduler's task lists or whatever were right next to the stack, and some of the important application variables weren't mirrored.


Will be interesting to see if this type of functionality starts showing up in some of the more heavyweight RTOSes.  IMO it would be a reaction to this fiasco right here.





<<   <   Page 3 / 7   >   >>


Flash Poll
EE Life
Frankenstein's Fix, Teardowns, Sideshows, Design Contests, Reader Content & More
Engineer's Bookshelf
Caleb Kraft

The Martian: A Delightful Exploration of Math, Mars & Feces
Caleb Kraft
3 comments
To say that Andy Weir's The Martian is an exploration of math, Mars, and feces is a slight simplification. I doubt that the author would have any complaints, though.

The Engineering Life - Around the Web
Caleb Kraft

Surprise TOQ Teardown at EELive!
Caleb Kraft
Post a comment
This year, for EELive! I had a little surprise that I was quite eager to share. Qualcomm had given us a TOQ smart watch in order to award someone a prize. We were given complete freedom to ...

Design Contests & Competitions
Caleb Kraft

Join The Balancing Act With April's Caption Contest
Caleb Kraft
56 comments
Sometimes it can feel like you're really performing in the big tent when presenting your hardware. This month's caption contest exemplifies this wonderfully.

Engineering Investigations
Caleb Kraft

Frankenstein's Fix: The Winners Announced!
Caleb Kraft
8 comments
The Frankenstein's Fix contest for the Tektronix Scope has finally officially come to an end. We had an incredibly amusing live chat earlier today to announce the winners. However, we ...

Top Comments of the Week
Like Us on Facebook
EE Times on Twitter
EE Times Twitter Feed

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)