Additional checks on activity and for errors
Extra check to ensure detection of input activity absences (failures)
On critical interfaces like certain communication or sensor interfaces, we need a mechanism to detect if the interface has fallen idle. The simplest way would be to have a state machine that detects the idle condition or absence of the sensor or interface inputs. State machine should also be intelligent enough to detect and filter out glitches (noise) on the inputs. Once the system detects the idle condition or noise, it should send the interrupt to the core which could take the necessary corrective actions.
Few ways of detecting the idle condition
- Checking that data doesn’t change for the long time
- Checking the current flow at the inputs
In a similar fashion we should be checking the overflow condition, say at the ADC input. ADC input should always be checked on the range and if ADC value is not within the recommend area then an interrupt to the core should be generated. The core can then take corrective action.
A voltage monitor can be used to detect non operating voltage condition. As soon as the power supply is outside the specified range, it can interrupt the CPU to take the device into a safe mode.
In built CRC checks in communication interfaces can help in ensure correct functioning of the interface.
Checking for the errors before the execution of the code or at regular intervals
Just after power-up, the silicon should run the LBIST and MBIST (memory built-in self-test) to check if the circuit has any issues. Only after the checking circuit correctness, should the actual application start. LBIST helps in catching most latent faults. These routines can be run after a periodic interval to give enhanced robustness.
Software should run regular scanning (CRC check) of critical configuration space/interfaces to ensure fault tolerant communication/working of the SoC.
Put the system into safe mode if any error is detected during the course of execution
Once any fault is detected in the system then system should reset the system in case of critical faults or put the system into the safe mode in case non critical faults. If the system continues to produce the critical fault then that particular part of the system should not be used by the application.
Watchdog counter is used to detect software malfunctions. Software malfunction can occur due to random hardware failure. Software is expected to regularly service the watchdog timer. If software fails to service the watchdog in required interval then it should reset the silicon or put it into a safe state
Runtime checking of the critical signals
Much logic is put into the design to ease the debugging of the software. This logic should be idle or in a static, non-functional state in the actual application. Such signals can be monitored and system would move into safe state if such a condition occurs. Even static configuration of the SoC can be monitored.
Critical clocks in the SoC, such as the system clock, input and output clocks of the PLL, and peripheral protocol clocks should be monitored. A clock monitoring unit on SoC must be kept which can detect clocks going out of range or have become inactive. In case they are out range/inactive, SoC can automatically switch to another clock source which is more stable or it might raise an interrupt so that the core could take necessary action.
Because all code resides in the flash, it is absolutely necessary to ensure clean operation in the flash. One way is to perform array integrity check using ECC on whole or part of flash array. Once a correction is indicated by ECC, a readback on the flash should be performed (for e.g. CRC of a block) to verify the correction done was due to single bit error and not multi-bit. The readback patterns can be checked against their expected value or force the read of a number of patterns sufficient to trigger other ECC error corrections/detection to reveal the actual nature of the fault.
The system IRQ (interrupt request) handlers must be capable of detecting false or missed interrupts since the IRQ generation logic is not usually replicated. This can be achieved by enabling the ISR (interrupt service routine) to perform certain checks to ensure that the ISR is called correctly and with proper priority. For example, the ISR can check that the interrupt was actually enabled in registers and flag was set or not. For periodic interrupts, a timer can be used to ensure interrupts are correctly generated. The final implementation is highly dependent on SoC need but it is imperative that incorrect and spurious interrupts are identified to ensure smooth operation.
This technique uses a mechanism by which the working of a module can be checked using feedback from the module and a monitor to check the feedback. For example, a write access to a RAM will get latched inside the RAM. This latched address, data and controls can be fed back to a monitor that can check that the latched values match the original access. Another example can be to check the rotation of a motor and angle of rotation against the desired and programmed controls. Any mismatch will trigger a corrective action.
There are number of design practices that are used consciously or unconsciously that inherently result in providing additional robustness and ensuring that a device still operates under harsh conditions without failing. These techniques become all the more important for certain critical applications such as automotive, medical, and aviation. There are specific standards today that ensure that these design practices are followed, thereby resulting in increased safety for end users.
ISO 26262 is one such standard for automobile applications. With the adoption and enforcement of these standards directly by automotive OEMs there is an increased push to all the suppliers to follow such safety standards for their deliverables. After all, the ultimate aim is to support and facilitate the development of safe products in the automotive industry—in other words—"Enhanced safety for the end user."
1. ISO26262 Specs (http://www.iso.org/
Ashish Goel is a verification lead, Prashant Bhargava is a senior systems engineer, and Sachin Jain is a design manager with the Automotive and Industrial Solutions Group of Freescale Semiconductor.
If you liked this article, go to the Automotive Designline home page
for the latest in automotive electronics design, technology, trends, products, and news. Also, get a weekly highlights update delivered directly to your inbox by signing up for our weekly automotive electronics newsletter here