Design Article
Fault-robust microcontrollers allow automotive technology convergence: Part 1, the nature of faults
Riccardo Mariani, YOGITECH SpA
9/11/2006 2:00 AM EDT
On the other side, as a consequence of such increased complexity, the population of faults is increasing as well. These include:
In particular, hardware faults (systematic or random) are worsened by: The increased soft-error failure rates (i.e. cosmic rays); coupling effects and disturbances are more and more important; and intrinsic uncertainty due to model inaccuracy is a problem of new technologies.
Moreover, system complexity and use of third-party IP increase the verification gaps and software faults. If we define "robustness" as the ability to continue mission reliably despite the existence of systematic, random or malicious faults, how do you design fault-robust MCUs ?
Measuring robustness
The first step is to agree on a common way to measure robustness. IEC 61508 is an international norm for the functional safety of electrical/electronic/programmable electronic safety-related systems. This standard introduces a quasi-deterministic approach to evaluate the robustness. IEC 61508 classifies systems with a Safety Integrity Level (SIL) mainly determined by the value of the safe failure fraction (SFF), the ratio between undetected dangerous hazards and the sum of detected dangerous hazards plus safe hazards.
For automotive systems with a hardware fault tolerance of zero (a hardware fault tolerance of N means that N+1 faults could cause a loss of the safety function), SIL2 is the minimum level and is achieved if SFF is greater than 90%. An example of a system requiring SIL2 is the Anti-lock Braking System (ABS). SIL3 (an SFF greater or equal than 99%) is the requirement for active safety systems such X-by-wire, active braking, and stability control.
Annexes of IEC 61508-2 deliver guidelines in terms of faults and failure modes to be considered for each component and include the recommended diagnostic techniques, graded according to their effectiveness with respect to the target SIL. An important role is played by the "Beta Factor" i.e. the probability of common cause failures that could become a limiting factor especially when multiple, functionally-equal channels are implemented in the same silicon.
It's better to look inside
One of the most crucial steps to enable the extensive use of MCUs is to extend IEC 61508 to System-On-Chip (SoC) design. This means to make use of system-level methods such Failure Mode and Effects Analysis (FMEA) at the SoC level as well and reap benefits of the information provided by such analysis to implement a structured approach to increase the robustness of the SoC.
To answer this demand, YOGITECH has developed a patented platform-based technology, faultRobust (see microcontroller representation below) consisting of:
Each faultRobust IPs (fRIPs) can be stand-alone or can be combined with other fRIPs.

The design and validation methodology is one of the key points of the technique. A tool suite (fRFMEA) extracts information from the Safety Requirements Specification (SRS, a document required by IEC 61508 to define the safety goals) and from the design database (RTL, gate-level, and back-end netlists), by using scripts based on commercially available EDA tools. These data are entered in a very detailed Failure Mode and Effects Analysis worksheet. Then, fault models and failure modes are considered in adherence to IEC guidelines. Finally, Safe Failure Fraction and diagnostic coverage are automatically computed using a statistics formula embedded in the FMEA worksheet.
IEC 61508 highly recommends that fault-modeling and fault-injection are intensively used during the design, verification, and validation flow. To be compliant with that intent, the fRFI tool suite performs a design-level fault injection working both at lowest (transistor, gate) and at the highest (block) level. This suite is a combination of proprietary tools and Specman by Cadence.
The tools are automatically linked to fRFMEA and they use an operational profile based algorithm, enhancing the speed of the fault injection campaign. The fRFI suite is not dedicated to specific fault models but can handle different fault models, such transient faults, permanent faults, combinations of the two, and customized fault models (modelled using the IEEE1647 "e" language).



