In light of the needs of functional safety systems, a new class of microcontrollers (MCUs) is emerging with an extensive offering of safety measures targeting the avoidance and control of systematic failures (i.e. failures potentially introduced during the design, development or manufacturing process) as well as the detection and control of random hardware failures (i.e. failures that can occur unpredictably and that follow a probability distribution).
• From a procedural perspective, these MCUs have been designed under special consideration of the requirements set forth in the ISO 26262 standard for the avoidance of systematic HW failures and offered with related collateral, such as FMEDA and the dependent failure avoidance measures and other relevant information documented in dedicated safety manuals.
• From a functional perspective, these MCUs offer integrated safety mechanisms for the computational infrastructure, such as dual-core lock-step execution of code, clock monitoring, power monitoring, ECC protection of RAM, ROM and interconnection structures, special considerations of peripheral I/O interfaces, etc.
• From a structural perspective these MCUs provide special architectural considerations, such as partitioning of the die into separate lakes and column multiplexing of memory structures to improve the effectiveness of ECC.
With these safety measures in place, the dual-core lock-step MCUs typically follow the applicable ASIL-D requirements from the ISO 26262 and are therefore often referred to as “ASIL-D MCUs”. (This is strictly speaking incorrect as the ASIL D notation characterizes the specific safety related ability of the system, but not how it is achieved. Nevertheless, the industry has adopted this usage of ASIL and those savvy in the field of functional safety understand how it is meant.)
Dual-core lock-step MCUs do not alleviate the need to implement safety measures at SW level and at system level, such as sufficiently independent monitoring of output values calculated by the SW path. However, among other aspects, such as higher integration, these MCUs do offer a separation of concerns for validation. In solutions based on multiple single-core MCUs the ability to detection and control of random hardware failures depends largely on the SW. Hence, the ability of the complete system to meet the targets set forth by the ISO 26262 requires at least knowledge of this SW, if not the complete development and integration.
For a dual-core lock-step MCU, it is possible to verify and validate key functional safety-related properties of the computational infrastructure at the hardware level independently from the SW since the computational infrastructure is offered in an integrated form and represents an integrated safety mechanism. This is a significant benefit within the HW/SW co-design process. Furthermore, the separation of concerns facilitates faster location of issues. If the safety mechanisms monitoring the dual-core lock-step trigger then the cause can most likely be attributed to random hardware failures at the HW level, while if the SW monitoring triggers then the cause is most likely to be a fault at system level or a systematic fault within the SW.
The dual-core lock-step MCU approach offers a potential availability advantage. In modern MCUs, the core area is diminishing well below 5 percent of the overall MCU, while the MCU as a whole is typically allocated a budget of approx 1percent contribution to the Probabilistic Metric for random hardware failures (PMHF). Hence, the contribution of the core is at first approximation in the region of 0.05 percent. However, certainty about the correct operation of the cores is key for any forward recovery technique implemented in SW to address the remaining 99.95 percent of contributions to the PMHF in order to maintain availability of the system. Additionally the dual-core lock-step MCU provides an appropriate infrastructure to implement multiple sufficiently independent channels. The need for such channels can typically arise from ASIL decomposition (as laid out in ISO 26262 Part 9, clause 5) and coexistence (as laid out in ISO 26262 Part 9, clause 6).
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.