Editor's Note: I am delighted to have the opportunity to
present the following piece from the fourth quarter 2010 issue of Xcell
Journal, with the kind permission of Xilinx Inc.
Dramatic surges in FPGA technology, device size and capabilities have over the last few years increased the number of potential applications that FPGAs can implement. Increasingly, these applications are in areas that demand high reliability, such as aerospace, automotive or medical. Such applications must function within a harsh operating environment, which can also affect the system performance. This demand for high reliability coupled with use in rugged environments often means you as the engineer must take additional care in the design and implementation of the state machines (as well as all accompanying logic) inside your FPGA to ensure they can function within the requirements.
One of the major causes of errors within state machines is single-event upsets caused by either a high-energy neutron or an alpha particle striking sensitive sections of the device silicon. SEUs can cause a bit to flip its state (0 -> 1 or 1 -> 0), resulting in an error in device functionality that could potentially lead to the loss of the system or even endanger life if incorrectly handled. Because these SEUs do not result in any permanent damage to the device itself, they are called soft errors.
The backbone of most FPGA design is the finite state machine, a design methodology that engineers use to implement control, data flow and algorithmic functions. When implementing state machines within FPGAs, designers will choose one of two styles –binary or “one hot”- although in many cases most engineers allow the synthesis tool to determine the final encoding scheme. Each implementation scheme presents its own challenges when designing reliable state machines for mission-critical systems. Indeed, even a simple state machine can encounter several problems (Figure 1
). You must pay close attention to the encoding scheme and in many cases take the decision about the final implementation encoding away from the synthesis tool.
Figure 1: Even a simple state machine can encounter several types of errorsDetection schemes
(Click on image to enlarge)
Let’s first look at binary implementations (sequential or “gray” encoding), which often have leftover, unused states that the state machine does not enter when it is functioning normally. Designers must address these unused states to ensure that the state machine will gracefully recover in the event that it should accidentally enter an illegal state. There are two main methods of achieving this recovery. The first is to declare all 2N number of states when defining the state machine signal and cover the unused states with the “others clause” at the end of the case statement. The others clause will typically set the outputs to a safe state and send the state machine back to its idle state or another state, as identified by the design engineer. This approach will require the use of synthesis constraints to prevent the synthesis tool from optimizing these unused states from the design, as there are no valid entry points. This typically means synthesis constraints within the body of the RTL code (“syn_keep”).
The second method of handling the unused states is to cycle through them at startup following reset release. Typically, these states also keep the outputs in a safe state; should they be accidentally entered, the machine will cycle around to its idle state again.
One-hot state machines have one flip-flop for each state, but only the current state is set high at any one time. Corruption of the machine by having more than one flip-flop set high can result in unexpected outcomes. You can protect a one-hot machine from errors by monitoring the parity of the state registers. Should you detect a parity error, you can reset the machine to its idle state or to another predetermined state.
With both of these methods, the state machine’s outputs go to safe states and the state machine restarts from its idle position. State machines that use these methods can be said to be “SEU detecting,” as they are capable of detecting and recovering from an SEU, although the state machines’ operation will be interrupted. You must take care during synthesis to ensure that register replication does not result in registers with a high fan-out being reproduced and hence left unaddressed by the detection scheme. Take care also to ensure that the error does not affect other state machines that this machine interacts with.
Many synthesis tools offer the option of implementing a “safe state machine” option. This option often includes more logic to detect the state machine entering an illegal state and send it back to a legal one—normally the reset state. For a high-reliability application, design engineers can detect and verify these illegal state entries more easily by implementing any of the previously described methods. Using these approaches, the designers must also take into account what would happen should the detection logic suffer from an SEU. What effect would this have upon the reliability of the design?
is a flow chart that attempts to map out the decision process for creating reliable state machines.
Figure 2: This flow chart maps out the decision process for creating reliable state machines.
(Click on image to enlarge)