Design Article
Challenges of safety-critical multi-core systems
Chris Ault, Wind River
4/23/2011 10:59 PM EDT
These benefits are also very enticing to projects that are building embedded systems specifically for the safety critical market. However, these systems have their own challenges with regards to safety certification. Ideally, safety systems would like to reap the same benefits (consolidation, performance, migration), while keeping certification costs as low as possible.
One particular attractive scenario for safety critical systems is to combine a certified subsystem, such as robot-control spplication, with a non-certified subsystem, perhaps a Linux or MS Windows based human machine interface. The challenge in this scenario is certification of the complete product.
The challenges of multi-core CPUs include interrupt handling, bus contention, and increased coding and debugging complexities; there are hardware devices on the CPU that cannot be shared among safety-certified and general-purpose applications.
By having the ability to partition the devices and present specific devices to certain cores and applications these challenges can be mitigated, and the benefits of multi-core can be realized. Complicated software can be used for this partitioning and isolation, but embedded virtualization offers a configurable means by which devices can be partitioned and presented to specific cores, operating systems, and applications.
Code footprint directly impacts certification costs. Choosing an embedded virtualization solution with minimal code footprint will minimize recertification costs and maintain the real-time responsiveness of a device. Choosing a safety-certified virtualization solution will ensure that the complete application stack can be safety certified.
This paper will explore the benefits of virtualization to safety critical systems and explore some of the challenges and how to mitigate the risks associated with them.
Migration to Multi-core
Safety critical systems are embedded systems that, in cases of errors or failures, could cause injury or loss of human life, loss or severe damage to equipment, or environmental harm.
Systems such as flight control, automotive drive-by-wire, or nuclear reactor management are examples. There is no room for software error in these systems. To ensure the utmost in reliable, bug-free operation, these systems must be scrutinized to various levels of industry-standard certification, depending on the nature of the device.
Safety-related components require temporal and spatial separation from other system components of different levels of criticality. Today’s separation concepts are mostly designed to use completely independent subsystems for each function, There could be, for example, a single board computer (SBC) for the safety related aspect and a separate SBC for the human machine interaction.
This approach is not hardware efficient; it increases cost, and limits product functionality and evolution. The introduction of multi-core CPUs in embedded devices offers unique opportunities for safety-critical equipment; however there are many challenges that need to be resolved.
Multi-core processors allow devices to be partitioned so that specific functions can be performed on dedicated cores. This ensures isolation performance while offering the ability to segregate functions. Functions that can be segregated include the separation, or isolation, of the safety-related functions from general-purpose functions of a device, such as standards-based communication stacks or enriched graphics for human-machine interfaces.
This segregation means the amount of code that needs to be certified is significantly reduced, which lowers product costs while increasing time-to-market. Another opportunity that arises from this segregation is the ability to update or enhance the general-purpose partitions without modifying the safety-critical applications.
The certification of a two-SBC system consisting of a small embedded RTOS with a safety function alongside a Linux system with an HMI requires certification of only the SBC with the RTOS and the safety function.
The Linux system is out of scope for certification. Consolidating this into a single SBC running Linux would increase the certification load to include the entire Linux system, which is not commercially feasible for high certification levels for standards such as IEC 61508 or DO178B.
What is really needed is a way to combine the small RTOS that hosts the safety function and consolidate it with the Linux system without raising the certification cost significantly.

Figure 1: Segregation of safety-critical and general-purpose functionality
Crafting a device that utilizes such segregation allows it to have mixed levels of safety and certification: Only a subset of the device needs to be safety-certified.
Next: Page 2


