An increasing number of systems for various applications require highly dependable operation, often in safety-critical jobs. One of the latest systems where that requirement has grown significantly is the automobile. In fact, automotive antilock braking systems have included such technology for many years; it is continually being enhanced and used in areas like electronic-based steering and suspension control. As electronics continues to become a larger part of the design of a new vehicle, systems are performing many more functions under the control of microprocessors, and consequently mechanical and hydraulic systems are being replaced by "by-wire" configurations of sensors, microprocessors and actuators. The goal is to achieve the so-called automated highway, vehicles that will autonomously drive in infrastructures that support higher traffic volume with fewer accidents.
Such systems require fail-safe operation, a function that has been facilitated traditionally by using two microprocessors. There are two common configurations: When two identical microprocessors are used, the configuration is known as symmetric redundancy; with a smaller, less powerful watchdog microprocessor the configuration is known as asymmetric redundancy.
Although each configuration has its own advantages and disadvantages, the symmetric approach is becoming more popular, particularly when both of the CPUs are integrated onto a monolithic silicon chip. The symmetric single-chip solution is popular because of decreased size, increased reliability (fewer system interconnections) and better electromagnetic compatibility performance. The last attribute results from having fewer high-speed signals connected to the outside world. Another benefit of having fewer communication links among several chips is the difficulty in synchronizing the operations of the two CPUs.
An advantage of using two separate microcontrollers (rather than two CPUs on a single die) is that there is more choice to select from many different devices that are available off the shelf. It is also possible to introduce redundancy into the software by using two different routines.
There is an important difference between fail-safe systems and fault-tolerant systems. Today's antilock braking systems are fail-safe: If an electrical system error is detected, the ECU (electronic control unit) switches to a safe off mode, allowing the foundation hydraulic brakes to operate without the faulty ABS system's interfering. A fault-tolerant system must not only recognize that an electrical fault has occurred, but must continue to operate safely with the existing known fault. Antilock braking systems use redundancy to facilitate a fail-safe system.
Typically, the CPU at the heart of the system supervises the continual testing of all the major system components, but it can validate these components only if it is known to be "sane." So a second, redundant, CPU is used to validate the sanity of the first one. A redundant central processor can be implemented either as a second standalone microcontroller or as an error-detect CPU with comparison logic on the same microcontroller. If the two CPUs disagree over the result of an instruction execution, the fault signal will be enabled and an interrupt will initiate a sequence of events to switch the control unit into the safe off mode. Dual CPU microcontrollers will increase their popularity in automotive safety-critical applications such as steering, airbags and ABS.
Both central processors are fed with the same inputs, although only the main one is used to control the chip functions. The complementary CPU is used only as a check to ensure that the output is exactly the same as that of the main CPU. In the event that a fault is detected, an interrupt can be generated that will indicate that the system should go into its fail-safe state.
Memory validation
There are other on-chip systems that may be required to satisfy dependability requirements; one might be the ability to verify the contents of on-chip memory. This may be a requirement if a dual CPU system is used with only one memory array. Therefore, something is needed to provide a check on the memory contents. There is a possibility that a memory bit may be flipped (it is, after all, impossible to discount the possibility of an alpha particle's hitting a memory cell and flipping a bit).
A typical module that is connected to the internal bus structure of the device will look for free cycles on the address-data bus; when the bus is not in use, a word will be fetched from memory. This word will be fed into a signature analyzer. The process repeats until the entire memory contents have been fetched sequentially and a signature representing the memory contents has been generated. This signature is then compared to a previously generated value that is representative of the known correct memory contents.
On CISC machines, there are usually lots of free cycles (often at least one every five cycles). On RISC machines, there are less free cycles, although it is possible to set up the memory validation so that it will "steal" a cycle if no free cycle is available. This allows deterministic validation time. If the signature matches, there is no need to generate any interrupts so the validation process is basically invisible and will be performed while the CPU is executing a control algorithm at full speed.
The memory system in future automotive microcontrollers may be enhanced in certain cases to allow simplified memory verification. In a safety-critical system it may be desirable to check if the memory contents are stable and have not been corrupted during operation. Several techniques can be implemented to validate memory contents, each with its own merits and problems. The most straightforward way is to implement a parity bit on the entire memory array. Whenever a byte is written to memory, a parity generator adds an extra bit, and whenever a byte is read from memory a parity checker will ensure that a parity error has not occurred.
One shortfall of the memory validation approach is that it becomes very difficult to use this approach on the contents of RAM, which changes often. Unless the CPU can stop executing the control algorithm for a RAM check exercise (very unlikely in highly dependable real-time embedded control systems), another method is required to ensure dependability of this array. Two obvious approaches can be employed.
Two approaches
The first is to add a redundant RAM array. This can be expensive, though, as RAM requirements may be large. The other approach is to add a parity scheme to the RAM so that each byte of RAM has an associated parity bit. A parity RAM generator is required to add these bits when the data is written to memory. When the data is fetched from memory, a parity checker is required to decode the parity and ensure that the retrieved data is the same as the written data. Unless the RAM array is very small, the parity scheme is likely to be more inexpensive than using a redundant RAM array.
In addition to these schemes for validating that the CPU and memory arrays are operating correctly, there are several features that are common on most families of microprocessors for fault detection. Bus errors are usually generated if a non-implemented or reserved address is specified. Clock monitors are used to detect slow or irregular system clocks, and illegal op-code traps are useful for flagging faults that may occur in memory, software faults and dynamic or static bus faults.
Serial dependability
Future automotive systems will continue to see significant growth in multiplexed serial communications. One difference will be that the number and type of multiplexed networks in the vehicle will increase. Unfortunately, a single communications protocol that can cost-effectively address every automotive application does not exist.
Instead, a number of different communications subsystems will be integrated in a modern vehicle. For example, highly dependable time-triggered Class C communication networks will be implemented to handle applications such as steering, braking and drive-by-wire. And very robust event-triggered communications networks will be used for multibag supplemental restraint systems.
Fault-tolerant systems are poised to become a growth area in the not too distant future, particularly in the automotive world, as brake- and steer-by-wire systems become a reality in the next few years.
Such systems must be "fail operational" as they are deemed safety critical; if the system develops a fault, it could have catastrophic consequences.
By-wire systems transfer electrical signals down a wire instead of using a medium such as hydraulic fluid to transfer muscular energy. A conventional antilock braking system is considered "fail silent;" if a fault in the electronic control system is detected the control system is switched off, leaving the manual hydraulic backup still operational. If no such backup is available (as in the case of a by-wire system) the system must continue to function when a fault occurs.
Existing communications protocols are unsuitable mainly because they are event-triggered. A precise moment at which a message will be received is not specified. A communications protocol can only be predictable if worst-case transmission time and jitter are known at the time of the design and meet the requirements of the application. The time delay between presenting a message to be transmitted at the sender's interface and receiving the message at the receiver's interface is known as the transmission time. Jitter is defined as the variability of this transmission time (maximum transmission time-minimum transmission time). Maximum jitter depends on the longest message that can be transmitted. Real-time control applications are very sensitive to jitter and it is an important parameter for developing real-time distributed systems.
Time-triggered and event-triggered systems operate very differently. For a time-triggered system, control signals are derived from the progression of time; whereas in event-triggered ones, control signals are derived from the occurrence of an event (that is, an interrupt). Time-triggered systems use state information that is obtained from the condition that exists for a defined period, such as the value of a sensor reading.
This differs from event information in that the event is an occurrence (such as a sensor's being triggered) above or below a defined threshold. Usually, state information is transmitted to several recipients that do not consume the message upon reception. Event messages are usually deposited at a recipient that will queue the information or will activate an interrupt service routine to take the appropriate action that is merited by the event occurrence.
In summary, time-triggered communication offers high predictability and easier testing for timeliness but is inflexible because additional nodes cannot be added to the system after it has been defined without previous knowledge.
Event-triggered communication systems are much more flexible with regard to adding nodes but are more unpredictable and require thorough testing to ensure that an overload condition does not arise when many events occur and the bus bandwidth is assaulted.
ROSS BANNATYNE IS MARKETING MANAGER AT THE CHASSIS SYSTEMS OPERATION OF MOTOROLA'S TRANSPORTATION SYSTEMS GROUP (AUSTIN, TEXAS).