An interesting discussion has emerged about the resistance of various kinds of system-level ICs to single-event upset (SEU) and whether this is an important to system reliability.
An interesting discussion has emerged about the resistance of various kinds of system-level ICs to single-event upset (SEU) and whether this is an important to system reliability. It has been widely stated, for example, that SRAM-based FPGAs are susceptible to radiation-induced alteration of their configuration memory.
In fact, any complex circuit with latches or memory whether FPGA or cell-based ASIC is susceptible to SEU. In FPGAs, an event can alter the configuration of the device. But is that more ominous than the havoc an SEU can wreak by altering a bit in the firmware of a DSP, for example? FPGAs aren't that special: Any complex circuit has to be analyzed to en-sure its SEU resistance is sufficient for the application involved. And in some cases, design techniques must be used to improve resistance.
This means an entirely different implementation approach in which designers assume that at least one operation in every task will fail. The granularity of this analysis depends on when you need the data and how long you have to think about it. But the basic idea is you must assume failure, and design circuits that will identify and correct the failure.
There are many established approaches. One, long used in fault-tolerant computers, is familiar and brute-force: Three identical circuits all perform the same operation and vote on each result. Other approaches may include error detection or correction on all memory devices. This is fine for large central memories, but it can become onerous in typical designs with lots of small, fast memory instances. Reliability may force centralization of memory instances.
Less well-understood but perhaps more valuable are techniques for encoding data so that an error will be detectable at the end of an operation as an invalid code. This may result in much less hardware overhead than triple redundancy, but much more design effort.
Nor is this just an issue for the space crowd. In the future, we may have to accept that the circuits we fabricate will not work perfectly, even after final test. Process variations, signal-integrity issues, aging and even the need for high yields may force us to include error correction as a matter of course in chip designs. We already do it for memory, but it's coming for logic as well.
Ron Wilson covers microprocessors, programmable/
reconfigurable logic and the chip design process. He can be reached at email@example.com.