High-availability Central Office systems are typically constructed in hot-swappable configurations, with multiple boards or cards arranged in parallel slots in a chassis or backplane. With this type of scheme, malfunctioning or obsolete boards may be removed from and replacement boards may be inserted into the live backplane at will. The backplane supply is usually -48 volts. Board capacitance may be many millifarads. Each plug-in module usually has a local hot swap controller, ensuring that power is safely applied to that board during both rigorous hot-swap events, and steady-state conditions. The hot-swap controller must protect against large inrush currents, over-voltage and under-voltage faults, and high backplane voltage transients.
At initial connection, the board will try to draw a large transient current from the backplane due to the large load capacitance that the board presents to the supply. The primary function of a hot swap controller is to limit this inrush current to acceptable levels, allowing an operator to replace malfunctioning or obsolete boards quickly and easily without having to power-down the system. Without this orderly application of load current the board and connectors could be severely damaged and the backplane voltage may be pulled down.
If a current fault occurs on a board after start-up, the controller should isolate the board from the supply, ensuring that the other boards in the rack are kept operational, and that a single faulty board will not pull the backplane voltage down, causing system-wide failures. With the increasing importance of uptime in high availability applications, the controller should permanently disconnect the board only when the current fault is permanent. Board shutdown due to transient current faults should be avoided, but every effort should be made to keep the board safe while assessing the seriousness of an over-current fault. For the best uptime performance transient current faults and permanent current faults must be dealt with in different manners.
A self-regulating linear current-control loop uses a FET, sense resistor and transconductance amplifier to limits the load current to the sense resistor (RSENSE) (see figure). A typical start-up profile (on an oscilloscope screen) would show what happens after initial connection of a board to the load capacitance. The gate voltage ramps linearly until the FET enters its enhancement region (i.e., switches on). The load current then increases quickly. The sense voltage also quickly increases to its maximum value, and the loop is in regulation. When the load capacitance approaches full charge, the load current decreases, and the loop comes out of regulation. The gate voltage ramps to its full potential, the sense voltage falls to its steady state level, and the output voltage climbs to -48 V.
Dynamically adjusting the gate voltage to limit the inrush current is extremely robust, making this method superior to slew-rate-limited techniques that are often employed. These simpler ramp methods can lead to FET failure when boards are removed and quickly re-seated--a typical reboot procedure in central office systems. When a board is removed its load capacitance can hold its own supply up for some time. The hot swap circuitry will keep the FET turned on since its supply is still good. If re-insertion occurs before the load has discharged the difference between the backplane voltage and the board voltage must be dropped across the drain-source terminals of the fully enhanced FET. Unlimited current will flow and the FET will be destroyed. Failures of this nature can be avoided with the dynamic-control scheme as the load current is monitored and adjusted during the inrush current control period, protecting the FET from dangerous, uncontrolled currents that would otherwise cause failure.
Short circuit protection
Short circuit protection guards against prolonged over-current events, but also prevents unnecessary board shutdown. Given the large supply voltage, a low impedance fault at initial connection due to a board short or during operation due to accidental human intervention could have catastrophic results. On the other hand, system uptime will be compromised if the controller were to shut the board down at the first sign of an over-current event. Thus, some level of short circuit immunity is thus required. In the past, the power designer had to trade off safety and availability, compromising between an annoyingly low and a dangerously high current fault tolerance. Many hot-swap controllers are designed to shut a card down as soon as an over-current event is detected for a short duration. Others never latch off completely, implementing an infinite retry scheme. During each cycle the controller limits the current for a short time, then turns off for a much longer time. This can be dangerous since latch-off never occurs, even in a permanent fault situation. Neither approach is ideal.
A new method for providing short circuit protection is the Limited Consecutive Retry scheme. The components involved for inrush current limiting (amplifier, FET and sense resistor) already provide a reliable method of registering over current events. The over-current level is again dependant on the value of the sense resistor. A timer and counter are added to the inrush control circuitry, and an auto retry PWM scheme is employed in the event of a current fault, with the number of retries limited via a counter initiated latch-off.
When an over current fault condition occurs the loop goes into regulation and the timer is started. If the loop is still in regulation after the duration of TON, an over-current fault is registered. The gate pin is then pulled low for TOFF. The TON / TOFF duty cycle is 3 percent. When TOFF expires the cycle is complete and the counter is incremented. The cycle repeats, allowing the loop to regulate to the maximum load current level once again. These retry cycles continue until the counter reaches a count of seven. At this point the controller latches off, and the power must be cycled to reinitialize the device (this is usually achieved by re-seating the card). This gives a short circuit immunity time equal to the total timeout period of the seven retries, or about 2.5 seconds by default (adjustable via external timer capacitor). The fault timer resets when the temporary fault is cleared, ensuring that accumulating temporary faults do not cause latch off.
Voltage protection
Hot swappable systems usually specify an operating voltage window, minimum and maximum backplane voltages that should keep the pluggable board active. Over-voltage situations could damage the hardware, while under-voltage conditions could limit performance. The board should be isolated from the backplane if the supply is outside the preset operating window.
Hot-swap controllers provide two voltage-monitoring functions: programmable under- and over-voltage monitoring of the input supply. Hysteresis prevents a voltage close to a threshold from continuously switching a board on and off. Filters protect against unstable or transient input supplies being passed to the board.
The controller must also be tolerant of high voltage transients on the backplane, as it derives its power from the backplane supply. The ground pin is referenced to the -48-V input rail, so ground is effectively floating with the negative supply rail. A shunt regulation scheme generates a constant above VEE that powers the device and drives the FET. The main advantage this scheme is that the device is shielded from large backplane transients by a resistor, and can thus survive backplane voltage transients to -200 V.
Alan Moloney is Applications Engineer at Analog Devices, Limerick, Ireland (/I)