Data protection and redundancy features implemented across entire SoC designs will help teams implement functional safety faster and at a higher quality level
As the electronic processing capabilities of automobiles increases, more and more semiconductor companies are trying to enter the market.
Those who want to transition from the mobility or PC markets have a long journey ahead if they plan on entering the new segment with a design that meets all the required safety standards.
Fortunately, SoC developers can start this long journey with intermediate trips along the way.
By incorporating fault tolerant features within the SoC on-chip communications infrastructure, design teams do not have to bite off more than they can chew. They can implement measures that protect the data path first, which will then put themselves in a better position to reach the finish line and get their projects qualified.
Most design teams see the final destination as the ISO 26262 standard and then work backwards to try to meet its requirements, but that could make the journey fraught with frequent and unpredictable pit stops.
It is much better to design the product from the beginning with the intent of meeting functional safety requirements.
The ISO 26262 Specification
The title of the ISO 26262 standard is “Road vehicles – Functional safety.”
Its purpose is to define processes to quantify the risk of hazardous operational situations in electronic and electrical safety-related systems.
An initial goal is to define safety measures and development processes that reduce the probability for systematic failures.
However, designers must also detect and control random hardware failures to mitigate the effects of those failures on human safety. Terms like “fault tolerance” and “resilience” are often used to describe the desired system response to safety-related system faults.
Designing for fault tolerance is the most important first step to reach ISO 26262.
It’s also a proven way to reduce both design schedules and the time it takes to meet safety standards. Although safety is the ultimate goal, the initial focus should be to make the project more reliable and more resilient to withstand the harsh operating conditions that are seldom considered in consumer mobile devices.
Fault-tolerance is well understood in the in CPU portion of most SoCs and the features listed here are commonly used within them:
- Unit protection by duplication and redundancy – such as Dual-Core Lock Step (DCLS)
- Duplicate unit checkers and fault safety controller;
- Built in Self-Test (BIST) for resilience functions;
- Data protection by monitoring;
- Data packet integrity;
- Partitioning for resilient and non-resilient domains.
Protecting the Whole SoC, not just the CPU
Unfortunately CPU-only safety is not enough because SoCs have become more complex with multiple subsystems and interfaces.
Designers need end-to-end resilience across the entire SoC.
Some might fall into the trap of designing these these measures by themselves and incorporating them late in the development cycle. This is fraught with risk and could wreak havoc on schedules.
Fortunately there are some solutions available that enable greater SoC-wide resilience and fault tolerance out-of-the-box.
For example, the Synopsys DesignWare ARC EM SEP and ARM Cortex-R5 and Cortex-R7 processors offer enhanced error management for safety-compliant systems such as those in automotive, military, or medical applications.
As an example, building a complete system-on-chip using an ARM Cortex-R5 processor and a network-on-chip (NoC) interconnect fabric would be enabled by implementing the following features:
- All command redundancy is generated/terminated;
- 32/64 bits AXI socket support;
- ODD/EVEN parity support.
- Cortex ECC terminated/generated in the Initiator NIU;
- Optional byte-level ECC generation inside NoC;
- Option to do NIU duplication.
From here, designers can discover easier routes to:
- User-Defined Payload ECC – to protect data traffic between non-ECC-generating IPs
- Packet Transport Protection: Checks, Parity, Timeout
- Service safety -- to protect register programming and other system control: Unit duplication, Checkers, Safety controller
Several SoC design teams have successfully transferred their mobile SoC architectures into the automotive market by expanding reliability features beyond the CPU and into the NoC. While other design teams become overburdened with meeting safety standards, these teams have already introduced silicon.
Safety-critical portions of the entire data path must be protected to meet specific ISO 26262 Automotive Safety Integrity Levels (ASIL), in particular ASIL C and ASIL D.
Implementing the right feature set to accelerate and improve design development is critical to achieving success. Choosing the appropriate semiconductor IP is critical in implementing resilience across the entire project.
Do-it-yourself (DIY) is not an easy option
Many teams that are making the transition will not have the experience or engineering prowess in-house to enhance the fault tolerance of their designs. They will face the decision to either acquire this experience or try to configure these features themselves.
Both cases present risk that translates into longer time-to-market.
Data protection and redundancy features incorporated into SoC interconnect products will help teams implement fault tolerant SoCs faster and at a higher quality level than possible before.
As teams become more adept at fault tolerance, they will place themselves in a better position to address safety standards required for each particular industry segment.
It’s important to note that products offering fault tolerance should be based on real industry requirements and use cases necessary to meet the quickly evolving needs of mission critical SoCs, regardless of whether they are for automotive, industrial, aerospace, or medical applications.
-- Kurt Shuler is vice president of marketing at Arteris and has extensive IP, semiconductor, and software marketing experience in the mobile, consumer and enterprise segments working for Intel and Texas Instruments. He is a member of the U.S. Technical Advisory Group (TAG) to the ISO 26262/TC22/SC3/WG16 working group, helping create safety standards for semiconductors and semiconductor IP. Prior to his entry into technology, he served in the U.S. Air Force Special Operations Forces.