Consider the old adage known as Murphy's law which states, "anything that can go wrong will go wrong." Electronics-based systems have evolved over the past 50 years to provide very sophisticated monitoring and control functions with remarkable reliability. Reliability concerns typically relate to the amount of potential risk to human life, followed closely by the high cost of failure and product user satisfaction. Nothing is ever perfect, however, so there will always be a need for ever-increasing reliability to produce safe, long-lasting electronic systems.
When system reliability is not an option, the best--but most costly--approach is the use of fully redundant circuitry. Duplicate circuits perform identical functions in parallel and some form of voting on the results is made to produce the safest outcome at all times. In many of these systems if faulty circuitry is detected it is automatically removed and replaced by an identical backup circuit. This is the ideal topology for long term reliable operation.
On the other hand, the consequence of failure does not always justify the cost of full redundancy. These systems rely solely on the built-in reliability of each and every component used. A single component failure can cripple the system or permanently impair the accuracy. A design of this nature assumes a lot of risk, but can be provided at the lowest cost.
The middle ground for highly reliable systems is fault monitoring, where circuitry monitors various elements of a system and report any anomalies. Since anything can happen at any time at any point in the circuit, the more elements monitored, the better. The reaction to a detected fault can range from complete system shut down, like the Dead Man's Switch on trains, to a simple service warning akin to an "idiot light" in automobiles.
This article will describe how long term reliability of a high voltage Li-Ion battery stack can be enhanced through the use of an LTC6801 fault monitoring IC. Battery power is an ongoing trend in applications such as electric vehicles, uninterruptable power supplies, medical instrumentation and even power tools with each having varying degrees of reliability expectations.
Challenges for long-life battery power
Batteries have become a major source of alternative energy to power vehicles and a myriad of other portable equipment. Li-Ion cells are popular for their high energy density, allowing for battery packs that are smaller and lighter than those with equivalent energy of other chemistries. For high-power applications such as electric vehicles, hundreds of batteries are stacked to create a high-voltage source resulting in less current through thinner and lighter-weight wiring. In such automotive applications, driver safety is the number-one concern, followed by owner satisfaction. As a result, there is significant motivation to provide safe and reliable long term operation. To that end, the charge on each and every cell must be continually monitored to maintain the optimum level for years of use.
In the simplest form, circuitry is required to measure the voltage on each cell in the stack. This measurement is typically performed by an analog/digital (A/D or A to D) converter, which passes the information to a microcontroller. The controller carefully manages the charge and discharge of all of the cells such that they are not operated beyond a tight range that can drastically shorten the lifetime of the cells. With hundreds of individual cells in a system, an integrated measuring circuit can significantly save on component count.
The LTC6802 from Linear Technology is just such an integrated functional block. It can measure and report the voltage on up to 12 cells and two temperature sensors, through a built-in 12-bit ADC. Any number of cells can be stacked on top of one another with the measured voltages of each group of 12 streamed serially to a host microcontroller. These measuring devices and the controller form the heart of the battery management system.
Careful control of the state of charge of each cell is essential for extending the usable lifetime of the cells, but it may not be enough to satisfy the ever demanding automotive customer. For sensitive electronics, the automobile presents a harsh and perilous operating environment. For worry-free long-term satisfaction a "what if" analysis of the system is necessary. A few questions to consider might be:
- What if a wire to a cell gets disconnected?
- What if the voltage measurement accuracy shifts?
- What if internal register bits get stuck in a way that could always indicate a good cell voltage reading?
- What if the measuring IC is somehow damaged by nasty system voltage transients?
The most insidious type of would trick the controller into determining that a cell, or group of cells, are in perfect condition, when in fact, they are not being measured properly. These cells could then fully discharge or get dangerously overcharged while the system is completely unaware. Something is needed to "monitor the monitor" for a higher level of reliable operation.
BMS fault monitoring with the LTC6801
As an alternative to a fully redundant measuring approach, a fault-monitoring circuit is wired in parallel with the measuring device and serves to double-check the basic functionality of the system. The circuit of Figure 1 shows this implementation for a stack of 12 Li-Ion cells using an LTC6802 measuring device with a companion LTC6801 fault monitoring device.
Figure 1: Combining cell measurement with fault detection for enhanced reliability. The LTC6802 provides precise measurement while the LTC6801 checks for over/undervoltage conditions on each cell.
(Click on image to enlarge)
The LTC6802-1 acts as the primary electronics in the system by measuring and reporting each individual cell voltage on command, and applying a discharge current to cells to distribute the charge on each cell. Data is transferred to a controller by an SPI serial data link.
At the same time the LTC6801 also monitors each cell on the stack. With no intervention by the system controller the LTC6801 periodically samples each cell voltage and performs a simple undervoltage and overvoltage comparison. If all is OK, the LTC6801 provides a differential clock signal at the Status Output lines. If anything is not correct, this clock stops. It does not provide any information as to the nature of the problem, as it just indicates that something is not quite right. Should this clock ever stop, the controller can then perform diagnostic procedures to determine what is wrong.
Much more than a fancy comparator
The LTC6801 was designed with careful consideration of the many potential system faults while also providing ease of use. An important design requirement was to permit the device to function automatically without any software. The only external requirements are power, drawn from the battery pack itself, and an enabling clock signal. Without the enable clock input the LTC6801 stops in a static low power state, drawing just uAs from the battery stack. The enable clock can be provided by the system controller or any other oscillator source such as an LTC6906 silicon oscillator. Upon receiving a clock signal the device wakes up and starts monitoring all of the cells automatically.
Figure 2 is a block diagram of the essential elements of the LTC6801. A 12-bit Delta-Sigma A/D converter filters and digitizes the voltage of up to 12 cells and two temperature sensors. A 5V regulator and a precisely trimmed 3V ADC voltage reference are built in. All programming of the device operating characteristics is done by pin strapping device pins to the 5V regulator, the 3V reference or V-. No external components are required.
Figure 2: Internal circuitry of the LTC6801 provides more than a simple comparator function.
(Click on image to enlarge)
Figure 3 depicts the range of overvoltage and undervoltage thresholds that can be programmed. The overvoltage, OV, thresholds were chosen for use with Li-Ion cells with a nominal voltage of 3.3V and a danger level of 4.2V while the undervoltage,UV, thresholds provide a reasonable indication of cell charge depletion. The OV and UV thresholds are programmed by different pins so any combination is possible. The OV and UV levels must be set to indicate that something is possibly wrong without being too close to the normal cell voltage, which could cause nuisance trips of the fault detection circuitry.
Figure 3: A selection of cell voltage warning thresholds are programmed through pin-strapping. Separate pins control the OV and LV thresholds so they can be set independently.
(Click on image to enlarge)
It is also possible to program a fixed amount of hysteresis, up to 500mV, to these thresholds. This is useful when a detected fault may trigger action that can cause a voltage change on the cells, such as instantly disconnecting the load from the battery stack. Hysteresis can prevent bouncing in and out of the fault condition.
Two other pin-strap programmable features are the repetition rate for checking the cells and the count of cells connected. All 12 cells and temperature inputs can be checked every 15ms, 130ms or 500ms. The slower duty cycle results in less supply current drawn from the battery stack. The cell count can be programmed between 4 and 12. This ensures that fault detection is only provided for the cells actually connected.
Any number of LTC6801's can be stacked on top of one another to monitor 100s of individual cells in very-high-voltage systems, Figure 4. The enable clock is buffered and output on two signal lines to be connected to the enable inputs of the next higher device on a stack. The enable clock snakes in and out of each device all the way up to the top of the stack.
Figure 4: Any number of LTC6801 cell monitors can be stacked. AC coupling of the Enable and Status signals is required due to the different operating voltages of the stacked devices.
(Click on image to enlarge)
Likewise, the all-important Status Output clock from each device is passed down to Status Input pins of the next lower one on the stack. The frequency of the Status clock is the same as the Enable clock and can be in the range of 2kHz to 50kHz. If any fault is detected on any cell, anywhere on the stack at any time, the Status clock of the device monitoring the offending cell will stop toggling.
This static condition will propagate down the stack to the bottom device. Any device using a form of edge detection, such as a watchdog timer or a counter capture/compare function can be used to monitor the Status output lines of the bottom device on the stack. When a clock transition is missed, this device can generate the signal to service the general fault.
Providing a continuous clock for the Status lines is an important feature. Use of a static logic level to flag any system fault always presents the possibility of failing in the logic state that signals that all is OK with the system. This would render the fault monitoring scheme useless. With a clocking scheme, the monitoring device has to be continually doing something to keep the clock running and all has to be right with the system, or it stops. The fault signal cannot get "stuck" in an OK state.
For extra logic noise immunity, the clocking of the LTC6801 is done differentially up and down the stack. For high voltage battery stacks there is often a requirement for isolation from the controller power source. With differential clock signals it is quite easy to add isolation transformers. This is another fault tolerant/safety enhancement consideration in the device design.
What if the monitoring device has a problem?
There is no question that system reliability is enhanced through redundant monitoring, but how can the proper operation of the monitoring device itself be assured? It is very important to prevent undetectable failure modes. To address this, the LTC6801 has a built-in automatic self-test feature. The self test can be initiated on demand or will be performed automatically after every 1024 cell-test cycles and takes 17ms to complete.
The circuit of Figure 2 shows how an LTC6802-1 device can be connected to allow running the self test on demand. A separate output pin signals whether the device passed all self-test functions and does not interrupt the cell Status Output clock. The self test checks four major functions.
Checking if the ADC, voltage reference and comparator are performing properly is one of the tests. To do this, a second internal voltage reference is measured three times. The first measurement compares the reference against two threshold levels within a tight window, and no out-of-range indication should be created. Next, the upper threshold limit is reduced to a value just below the expected voltage level and a second measurement of the voltage reference is made, and the comparator should produce an overvoltage indication. Finally the lower threshold is set above the expected voltage and an undervoltage indication should occur. This provides confidence that the analog part of the ADC is working properly and that the comparison thresholds can be changed and are accurate.
The digital portion of the ADC is also tested. Two test signals are applied which will produce digital output readings of alternating ones and zeroes. The 12-bit output codes will be 0xAAA or 0x555. This confirms that no ADC output bit is stuck high or low.
The high-voltage multiplexer that connects the cells is also tested. If the address decoder for the switches were faulty, one or more of the cells could be skipped over while other cells are measured repeatedly. Skipping cells would mean a bad cell could go undetected. Other multiplexer failures such as simultaneous selection of cells or short circuits between switch inputs would generate over- or under-voltage indications on at least one cell input channel. The self test ensures that every cell is measured or an error is flagged.
A fourth, very important self-test function determines if any cell connections are open circuited. For this test, each cell is measured with a small 100μA current sink connected to each end. An open wire to the cell will allow the current sink to pull down the voltage input to the ADC for that cell. Measurement of the next cell above the open wire will produce an overvoltage indication and be flagged.
This periodic self testing adds to the reliable operation of the system. Checking the device that is doing the checking adds confidence that all is well.
Coarse temperature inputs
The operating temperature of Li-Ion cells is an important factor in knowing the state of charge of each cell. Temperatures are precisely measured by Battery Management devices such as the LTC6802-1. The fault monitoring LTC6801 also has two coarse temperature inputs. These readings are coarse because the voltages input to the Temp pins are simply compared to a threshold of Vref/2 or 1.5V. If the input voltage is above 1.5V it is considered good, if the input is below the threshold it is considered a fault.
Arranging temp sensors such as thermistors with resistors in a voltage divider fashion, Figure 5, can create a simple over/under temperature monitoring function. If the ambient or a specific point temperature goes beyond a pre-determined range the Status Output clock is stopped in the same manner as a cell voltage fault.
Figure 5: Coarse temperature sensing is possible through two TEMP input pins to internal voltage comparators. This example monitors the system operating temperature over a window of -20°C to +60°C. Exceeding the temperature limits flags an error.
(Click on image to enlarge)
Preserving the proper charge level of all batteries in a system will add years of service to a costly battery pack. This is essential for customer satisfaction in automotive systems such as electric-powered vehicles as well as uninterruptable power-supply backup systems. The LTC6801 is a cost-effective way to improve the long term reliability of Li-Ion battery management systems through redundant fault monitoring. Running in parallel with a more-precise cell measurement system, the LTC6801 provides a double-checking function that all system elements are operating properly. In the event of a malfunction, a flag is raised to initiate a problem resolution procedure. This helps to add safety to the reliability of the end product, which is never a bad thing.
About the author
Tim Regan is manager of applications at Linear Technology Corp. (Milpitas, CA) for signal-conditioning products (amplifiers, comparators, filters, precision references, timing functions and RF circuits). He has provided applications assistance on all manner of analog semiconductor devices at Linear Technology Corp. and National Semiconductor Corp. since graduating in 1973 with a BSEE from DeVry Institute of Technology. Tim enjoys the occasional round of golf.