Designing a fault-tolerant remotely upgradeable system requires consideration of three major areas: delivering a reliable bit stream, verifying that the bit stream was delivered intact and that programming occurred error-free, and recovering if it has not. One of the most practical and cost-effective methods to ensure continued fault-tolerant uptime is to employ a double-buffer design.
Upgrades of hardware can be achieved either by changing socketed devices or by using reprogrammable solutions that are either upgraded remotely using a customer network or a nonbounded connection, such as the Internet, or upgraded locally.
Before detailing the advantages of a double-buffer approach, it is useful to explain why other approaches don't measure up. While the traditional one-time programmable PROM undoubtedly offers the lowest cost during manufacturing and design, it has significant disadvantages over the lifetime of the product. Basically, it loses any cost advantage during upgrades.
A better approach in terms of efficiency and cost uses reprogrammable logic devices, such as field-programmable gate arrays (FPGAs). As reprogrammable logic and networks have evolved, designers have a wide range of programming options with varying cost structures, including local as well as remote programming. The particular application may dictate the best approach, but most installations are extremely well-suited for remote programming.
A centrally controlled remote upgrade allows complete control of the upgrade process, and it can ensure that all products are upgraded within a reasonable time of each other. This is very important if there is an interdependence of features within the upgrade. Using such a programming method requires a level of intelligence in the targeted device. Such intelligence lets it respond to a remotely initiated upgrade and enables it to self-recover if any fault conditions occur during the upgrade.
If a reprogramming download does not work, what happens next? For upgradeable hardware, it is important to be able to reset and redo the download, especially if it has been done remotely. Otherwise, the user could be looking at some significant downtime.
The recovery options available when a download does not work are directly related to the system architecture, or the download technique employed by the programmable logic.
There are three main methods to download an upgrade: overwrite, shadow, and double buffer. Each offers increasing levels of reliability and fault tolerance.
In an overwrite type of download, the upgrade data is immediately written over existing data in the reprogrammable device. The main reason for choosing this type of download is to reduce the cost of memory required.
For example, a flash device is erased and new data copied into the device. Next would come an accuracy check, and then the product would be restarted with the upgraded data. But there are a number of reliability risks associated with this approach.
If the power fails in an overwrite situation after the flash memory has been erased but before the upgrade has been copied in, then the product can no longer boot up, because the good data in RAM has been lost and the data in the flash memory is now bad. If the download fails because of a bad checksum, then the operator must try again, hoping that a good download can be accomplished before the power fails and the product becomes unstable. If the download has a good checksum but contains bugs, the product might still become unusable.
When an overwrite upgrade fails, designers must have designed in a method to recover the device to a minimum level of functionality. If this has not been considered in advance, then it will probably be necessary to return the product to the manufacturer.
In an attempt to improve the reliability of download but still minimize cost, the shadow technique can be used. Here, the new upgrade data is first copied to a spare area of RAM. Once downloaded, the data can be checked for download errors before flash reprogramming proceeds. But this method only addresses errors that occur during a download. The product can still become unusable if the power fails during the reprogramming phase or if the upgrade data contains errors.
A double-buffer architecture solves the delivery and recovery problems encountered using the overwrite and shadow methods. If an upgrade fails, it can be redone or the original data can be restored. And several methodologies can be combined for flexibility, which allows many variations of upgrade scenarios and cost structures to be presented to the customer and manufacturing teams.
In essence, the double-buffer method involves the provision of extra flash memory to allow the original data to be kept until the whole upgrade process has been completed and verified, even to the point of checking the new functionality of the product. The microprocessor would run on its bus while the reconfigurable logic was configured. Simultaneously, a separate set of flash memory holds the configuration data for the logic. With this technique, it is still possible to return to the non-upgraded data, even after the upgrade has been performed.
A typical implementation of double-buffer design is comprised of reconfigurable logic and uses two banks (Bank A and Bank B). For a more resource- and cost-efficient solution, an alternate double-buffer approach can be implemented using the microprocessor's memory and requiring only Bank A. In such an architecture, the original data is stored in the main microprocessor memory. The logic is configured either by removing the microprocessor from the bus during configuration or by having the microprocessor configure the logic by writing to it as a master device.
Actual implementation should depend upon the requirements of the specific circuit. It may be necessary to have the FPGA configured before the microprocessor can operate if the FPGA contains the bus control logic.
For a double-buffer architecture to function successfully in a fault situation, the original designers need to consider the entire data-flow scheme, including reliable bit-stream delivery and verification, as well as recovery.
Delivery of the bit stream
For systems using a non-bounded network connection, such as the Internet, there are some special bit-stream delivery concerns. Since the available bandwidth and network paths change continually, no connection can be guaranteed as permanent. As a result, the communication methods must be flexible and not time-bound. The data transport must take into account that the network is essentially connectionless. As a result, upgrade data should be sent in a form that allows for pieces of it to arrive in any order. Then the data must be correctly ordered and error-checked before the upgrade is performed.
In all remote programming cases, the upgrade bit stream is received by the microprocessor and stored in the non-active configuration bank (the bank that was not used to configure the FPGA). After the upgrade bit stream is received, the data is checked.
If the system verifies that the new load is complete and accurate, then the microprocessor configuration bank remains as the "Next Active." But if there is a problem with the load, or if the user detects bugs, then the Next Active flag will be changed to indicate the original load as the known-good load, and that will be loaded on the next reboot. Thus, a known-good configuration is always maintained in the product, allowing immediate recovery from faults originating from download processes.
If the data on the microprocessor is verified as correct, then the FPGA will be instructed to access the new data, and the data will be programmed into the FPGA. A programming methodology such as JTAG, a standard technique, or SelectMap, a proprietary one from Xilinx Inc., is required to transfer the data from the storage area on the microprocessor to the FPGA. The tools can be used to confirm that programming was successful.
To succeed, then, a double-buffer method requires designers to consider the entire download strategy carefully before starting the system design. All of the download, verification and recovery requirements must be designed in to both the software and hardware. The various buses on the system must be separated with the appropriate buffers to prevent interaction between the different needs of the FPGA download and the microprocessor. It may also be necessary to separate the configuration memories from the pins of the FPGA when the pins are used as functional inputs and outputs elsewhere in the circuit. That allows microprocessor access for programming purposes.
See related chart