When one creates a high-reliability microcontroller (MCU) design, there are many things that can impact reliability, including electro-migration and other thermal-induced and stress-induced hardware faults, soft errors or SEUs (single event upsets), timing errors in the logic due to poor system clock specification and construction, HIRF (high-intensity radiated fields), and radiated susceptibility and conducted susceptibility, power integrity issues, and software issues... to name but a few.
One of the larger areas for failure in the integrated circuits (ICs) used in embedded environments, such as those found in automotive applications, is reliability due to electro-migration and other thermally induced effects. Subsystems like on-board flash memory can fail when exposed to high temperatures, which can cause the charge to "leak" out of their floating gates. Sophisticated MCUs with features like hardware floating-point and DSP functions can be adversely affected by high temperatures when running at speeds over 100MHz. This is due to the large number of logic gates coupled with the high clock speed drawing more power causing internal heating of the die. There are several techniques IC manufacturers can use in order to combat these effects -- high-temperature IC processes are each chip maker's "secret sauce" and are unique to each foundry.
Sad to relate, one can invest $5 billion in a one-of-a kind wafer-fab, but still not really know how well a chip will do without building and selling millions of these devices and seeing how temperature and aging affects them. This leaves the architecture of the MCU as another method of reducing this risk.
Another area of concern is soft-errors. There are whole cities located in areas where the dust naturally creates soft errors. Even exposing a board to air, during service, can coat the board with radioactive particles. This can also happen in hospitals, where radiology departments -- along with a host of other diagnostic tests and therapies -- can potentially put some radioactive content in the air. Aircraft can suffer from the effects of cosmic rays, as can automotive systems at high northern latitudes and high altitudes. Even items used in underground mining can be exposed to radioactive particles.
Electrical noise upsetting logic can also be an issue. No matter how good a de-coupling network one uses, capacitors all effectively have a series inductance and series resistance built in, thereby making them non-ideal (see also Become a Decoupling Capacitor Network Guru). Also, HRIF, susceptibility, and power-integrity issues can all be potential issues. No enclosure, power supply, or circuit board is completely immune from these effects. Timing errors due to poor timing source choice issues can also present issues. Some implementations turn out much better than others, but at some level all can potentially have failure modes.
Software is yet another area where problems can arise. Some systems even employ dissimilar methods of implementing the software and hardware and comparing results to detect issues in development, thereby reducing the risk of design-related errors impacting operation.
With all the above issues having the potential to cause problems in a design, and something invariably does go wrong in some form or another, MCU vendors like Texas Instruments (TI) are looking to other methods to further reduce risk for safety-critical systems. To that end, MCUs like the TMS570 series are coming into use.
The TMS570 design features dual lock-step CPUs, which means that a hard or soft fault in one of the CPUs is almost assured to cause a miss-compare between the two CPUs. In turn, this will either halt or reboot the system. Additionally, these MCUs feature ECC/EDAC (Error Correcting Code/Error Detection and Correction) on all RAM and Flash memory. This allows multiple soft or hard errors in different words in the Flash and RAM to be corrected on-the-fly.
If you are interested in learning more, the TMS570 Microcontroller USB Kit can be used to quickly evaluate code development and performance of the TMS570 MCU.