Recent advances in dependable embedded-system technology, as well as continuing demand for improved handling and passive and active safety improvements, have led vehicle manufacturers and suppliers to work to develop computer-controlled, by-wire subsystems with no mechanical link to the driver. These include steer-by-wire and brake-by-wire and are composed of mechanically decoupled sets of actuators and controllers connected through multiplexed in-vehicle computer networks.
Recently, much attention has been devoted to the time-triggered architecture as a basis for by-wire systems. Although there has been considerable reinterpretation of the protocol requirements, there is a general agreement that the overall strategy is correct. Also, various fault-tolerance strategies are possible, leading to many possible configurations even with the "standard" architecture; moreover, additional fault-tolerant strategies can be layered on top of the basic strategy.
There are several characteristics of the basic time-triggered architecture. It includes multiple controller modules or nodes and communication via a Time-Triggered Protocol (TTP) over multiple channels (for fault tolerance). Messages are primarily state-oriented rather than event-oriented-each persists until it is changed and can only change at defined intervals. The network appears to nodes like a global memory, and nodes are internally self-testing and drop out of participation in the network when value or timing errors are detected. Communication errors (timing and value) are detected by the network, rather than by application programs. The primary fault-tolerance strategy is replication of fail-silent components: When one task or node fails, another is already running and the system employs the alternative. Many variations on this strategy are possible and the goal typically is to take advantage of natural redundancy when possible. Signal definition, timing and scheduling are done off line and deemed correct before the system is assembled. This makes it possible to easily compose systems from independently designed and tested subsystems.
The characteristics translate into five key features:
- Predictability: Computation latency is predictable because of deterministic time-triggered scheduling.
- Testability: Proper arrival time checking of computations is automatic with time-triggered protocol and scheduling.
- Integration: Systems that can be easily composed from independently designed and tested components or subsystems.
- Replica determinism: The behavior of replicated components is consistent between components; each is doing the same thing at the same time or at some predetermined offset time.
- Membership: Fault status is automatically broadcast in a time-triggered architecture by means of "membership" on the communications network.
The first four features primarily support design correctness and complexity concerns; the fifth supports fault-tolerant operation, a continuous safe state and multilayer diagnostics. The goal is to provide fault tolerance without excessive complexity.
Currently, TTP is the main protocol supporting time-triggered architectures, but others are possible and the industry must reach a consensus to get beyond the development stage. TTP was developed specifically to support the time-triggered architecture; both of them were developed at the University of Vienna, and TTP is available commercially.
Several aspects of the time-triggered architecture are illustrated below, including fault-tolerant units with redundant network nodes, fail-silent nodes that cease communicating when an error is detected and dual communication paths for network fault tolerance.
An essential aspect of time-triggered architecture design is fail-silent components. Fail-silence depends on the quality or coverage of self-testing within the components, which applies to all aspects of the system: software, controller hardware, sensors and actuators. It is assumed that additional testing and diagnosis at the subsystem and vehicle levels may result in actions other than fail-silent. Software components can be made self-testing by a variety of techniques, such as acceptance or "sanity" checks on computed results; diverse, replicated components; and redundant computation with diverse data.
For controller hardware, Delphi has developed a bootstrap approach beginning with complete self-testing of the core-that is, the CPU, memory and main data channels into and out of the CPU.
The heart of Delphi's Secured Micro-architecture is a dual CPU integrated on-chip, where the second (shadow) CPU executes exactly the same code on the same data at the same time. The results of shadow computation are not used, but compared with primary CPU results. The compare function is also self-testing; should something not compare, outputs are disabled, but diagnostics can take place internally.
The dual-core architecture technique guarantees detection of internal CPU hardware faults that cause errors during operation. Identical faults affecting both CPUs are rare and generally can be detected easily by additional logic. Other on-chip functions are self-testing or can be easily tested by the CPU since it can be trusted. Data memory has error-detecting codes. Program memory and buses are tested by special data monitors.
This self-testing strategy offers some additional features and benefits. First, it is software transparent, meaning no special-purpose or CPU-specific functions have to be added to test the core. Therefore, software can be easily maintained and ported to other CPUs. Second, this approach is generally low overhead. The CPU accounts for only a small part of the chip area for most microcontrollers compared to memory and other on-chip circuits. The memory space required for additional tests is typically more expensive. Third, the technique doesn't require complex self-testing strategies at the controller module or board level.
Generally, Delphi has had good experience with this approach. Several million units are currently used in electric power steering and antilock brake system controllers. It does catch real errors and doesn't appear to create new problems such as timing glitches.
Finally, at the subsystem and vehicle behavior levels, anomalies must be detected and actions taken. In other words, it is necessary to detect inappropriate behavior regardless of the source. Anomalies may arise from complex interactions of components or incomplete specifications rather than faults. Promising research techniques may lead to systematic solutions to developing diagnostics, a strategy that is commonly called model-based diag-
nostics. Abstract models of the control system behavior are executed on line and compared to actual system behavior. However, specialized models with appropriate features and real-time performance must still be developed.