Computer platforms intended for telecommunications and mission-critical use require very high availability (HA) and reliability. To achieve this, all hardware must be redundant and replaceable without stopping the system. CompactPCI is a recent bus development that has hardware replaceability, or Hot Swap, as an intrinsic feature, accounting for its popularity.
CPCI is regulated by the PCI Industrial Computer Manufacturers Group (PICMG), which has published a series of electrical, mechanical and software specifications that collectively define the technology. However, the specifications published so far provide only for the Hot Swap of peripheral cards, devices and power supplies. A major element missing from them is the ability to Hot Swap the system processor card itself. To address this, Motorola Computer Group has continued to develop systems that allow the system processor to be redundant and Hot Swappable. This technology gives the overall system enhanced availability. System processor hardware failures can now be tolerated, as well as a large variety of system software failures. When such a failure is detected, the redundant processor is activated and quickly takes over from the failed processor.
To allow system processor slots to Hot Swap, several facilities must work in concert. The hardware must be designed to allow control of CPCI bus domains to be transferred from one processor to another without disrupting bus operation. Moreover, successfully performing a domain takeover imposes many requirements on the system software. Furthermore, to facilitate interoperability, certain features of the hardware must be agreed upon by designers and vendors. The PICMG 2.13 subcommittee is currently working on standardizing these features.
The PCI specification provides an important control in each PCI function's configuration space to control bus master operations. The master enable bit in the command register may be used by the Hot Swap software to disable origination of PCI bus cycles by a slot. But this may result in overruns or underruns for that function, which must be handled any time the traffic of a function or bus is suspended for longer than a few microseconds.
Several slot control signals, radiating from a system slot to drive each individual slot, are specified by the HA Hot Swap standard. Two of these, Board Select (BdSel) and PCI Reset (PCIReset), are used in controlling bus traffic following a processor switchover.
Negating the BdSel signal removes back-end power from a CPCI board, effectively powering it off. This method of stopping bus activity from that slot has the disadvantage of forcing the board to go through a power-up sequence prior to returning to service.
A second method of stopping a slot's bus activity is to assert the PCIReset signal to a slot, which causes that slot's PCI interface to reset and float its electrical connections for the duration of the reset. The reset will propagate onto the board's PCI bus in accordance with the PCI specification, and may reset the entire board or only the PCI bus, depending on hardware implementation. However, power is maintained and volatile memory should not be lost. Whether the board's software will recover from this state without a complete initialization is a matter for the software designer to determine.
PICMG 2.13 proposes a hardware and software communications mechanism between two system slots. One system slot processor functions as the PCI system agent and the other processor is a powered standby. A bus switchover protocol allows the standby processor to request and receive the CPCI bus from the active processor. In doing so, a sequence of messages is exchanged at both the software and hardware level. The intention of these messages is to allow the active software to quiet the CPCI bus and perform final-state checkpoints prior to the switchover. When a safe state has been reached, the hardware sequences the system slot and system agent function from one processor to the redundant processor.
Processor switchovers can be classified along two orthogonal criteria: the relationship of the two processors during the switchover, and the maintenance of state within the payload and its associated driver.
If both system processors can participate in the bus domain switchover, then the switchover will be considered a cooperative one. Otherwise, the switchover is considered to be pre-emptive.
Hardware switchover
A cooperative switchover occurs when the claiming processor notifies the current owner and waits for the owner's consent before claiming the bus domain. Since cooperative switchovers are preferred, the switchover request should trigger an interrupt to maximize the probability that the current owner will take notice of the switchover request even if it is experiencing certain types of software faults. A cooperative switchover procedure will attempt to notify and quiesce all I/O functions, although they may be allowed to complete checkpoint transfers. Additionally, the current owner may attempt to complete state checkpointing of drivers and other items before consenting to the takeover.
By performing these functions, the system state is most likely to be preserved, and the probability of a clean takeover, subsequent recovery and continuation of the system function is maximized. Certain hardware or software faults may interfere with a cooperative takeover. For example, the checkpoint link between processors may have failed, preventing a clean checkpoint from being established, or the current owner may have established an interrupt-inhibited environment, causing it to fail to recognize the takeover request.
A pre-emptive switchover is simply any switchover that did not satisfy the conditions for a cooperative switchover. It is normally initiated in the same manner as a cooperative switchover. Before receiving the current owner's consent, the claiming processor determines that the time allotted for the cooperative switchover has elapsed. At that point, the bus domain is forcibly switched to the new processor.
There are three levels of domain switchover related to payload and driver state maintenance. In cold switchovers, the I/O devices and their associated new drivers have no state maintained from before the switchover. In warm switchovers, I/O devices maintain at least some state from before the switchover and will be notified in some manner that a switchover has occurred. In hot switchovers, the I/O devices are unaware that a switchover has occurred.
Quick switch
Hot switchovers are accomplished by quickly switching a domain into an identically configured system processor. The I/O devices then resume operation without reconfiguration.
To perform a successful hot switchover, the new system processor must maintain a resource configuration identical to that of the original system processor. That requires careful checkpointing of system resource allocations, such as PCI bus numbers, PCI address maps and DMA buffer physical addresses, which most operating systems will need modification to support. The primary advantage of a hot switchover is that it may be possible without modification to the payload devices' downloads.
Cold and warm domain switchovers require little special resource management because PCI reconfiguration is allowed between the switchover and I/O resumption. However, the same cannot be said for hot switchovers. Because the device I/O is allowed to continue without reconfiguration, every resource related to I/O operations must be carefully managed. Operating systems supporting PCI Hot Swap have dynamic mechanisms for allocating PCI resources upon device discovery. The standby processor must have a means of tracking the allocations made by the active processor.
Since I/O devices may have pending DMA requests at the time of domain hot switchover, it is necessary that the physical addresses used for DMA by the active domain are similarly allocated in the standby domain, even if the amount of memory on the two processors differs. That requirement is not normally met by current operating systems.
The promise of CompactPCI is that it will enable users who need highly available computing to build systems that provide hot swap and redundancy off-the-shelf, with interoperability among multiple component vendors. That requires control and specification, not only of the hardware but, equally significantly, of the software. Work being done in the PICMG 2.13 committee could make this happen.