datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Using FPGAs in mission-critical systems

Adam Peter Taylor, Principal Engineer, EADS Astrium

12/6/2010 5:01 AM EST

Editor's Note: I am delighted to have the opportunity to present the following piece from the fourth quarter 2010 issue of Xcell Journal, with the kind permission of Xilinx Inc.

--------------------------------------------------------------------
Dramatic surges in FPGA technology, device size and capabilities have over the last few years increased the number of potential applications that FPGAs can implement. Increasingly, these applications are in areas that demand high reliability, such as aerospace, automotive or medical. Such applications must function within a harsh operating environment, which can also affect the system performance. This demand for high reliability coupled with use in rugged environments often means you as the engineer must take additional care in the design and implementation of the state machines (as well as all accompanying logic) inside your FPGA to ensure they can function within the requirements.

One of the major causes of errors within state machines is single-event upsets caused by either a high-energy neutron or an alpha particle striking sensitive sections of the device silicon. SEUs can cause a bit to flip its state (0 -> 1 or 1 -> 0), resulting in an error in device functionality that could potentially lead to the loss of the system or even endanger life if incorrectly handled. Because these SEUs do not result in any permanent damage to the device itself, they are called soft errors.

The backbone of most FPGA design is the finite state machine, a design methodology that engineers use to implement control, data flow and algorithmic functions. When implementing state machines within FPGAs, designers will choose one of two styles –binary or “one hot”- although in many cases most engineers allow the synthesis tool to determine the final encoding scheme. Each implementation scheme presents its own challenges when designing reliable state machines for mission-critical systems. Indeed, even a simple state machine can encounter several problems (Figure 1). You must pay close attention to the encoding scheme and in many cases take the decision about the final implementation encoding away from the synthesis tool.

Figure 1: Even a simple state machine can encounter several types of errors
(Click on image to enlarge)

Detection schemes
Let’s first look at binary implementations (sequential or “gray” encoding), which often have leftover, unused states that the state machine does not enter when it is functioning normally. Designers must address these unused states to ensure that the state machine will gracefully recover in the event that it should accidentally enter an illegal state. There are two main methods of achieving this recovery. The first is to declare all 2N number of states when defining the state machine signal and cover the unused states with the “others clause” at the end of the case statement. The others clause will typically set the outputs to a safe state and send the state machine back to its idle state or another state, as identified by the design engineer. This approach will require the use of synthesis constraints to prevent the synthesis tool from optimizing these unused states from the design, as there are no valid entry points. This typically means synthesis constraints within the body of the RTL code (“syn_keep”).

The second method of handling the unused states is to cycle through them at startup following reset release. Typically, these states also keep the outputs in a safe state; should they be accidentally entered, the machine will cycle around to its idle state again.

One-hot state machines have one flip-flop for each state, but only the current state is set high at any one time. Corruption of the machine by having more than one flip-flop set high can result in unexpected outcomes. You can protect a one-hot machine from errors by monitoring the parity of the state registers. Should you detect a parity error, you can reset the machine to its idle state or to another predetermined state.

With both of these methods, the state machine’s outputs go to safe states and the state machine restarts from its idle position. State machines that use these methods can be said to be “SEU detecting,” as they are capable of detecting and recovering from an SEU, although the state machines’ operation will be interrupted. You must take care during synthesis to ensure that register replication does not result in registers with a high fan-out being reproduced and hence left unaddressed by the detection scheme. Take care also to ensure that the error does not affect other state machines that this machine interacts with.

Many synthesis tools offer the option of implementing a “safe state machine” option. This option often includes more logic to detect the state machine entering an illegal state and send it back to a legal one—normally the reset state. For a high-reliability application, design engineers can detect and verify these illegal state entries more easily by implementing any of the previously described methods. Using these approaches, the designers must also take into account what would happen should the detection logic suffer from an SEU. What effect would this have upon the reliability of the design? Figure 2 is a flow chart that attempts to map out the decision process for creating reliable state machines.
Figure 2: This flow chart maps out the decision process for creating reliable state machines.
(Click on image to enlarge)




Mario Blunk

12/8/2010 3:03 AM EST

Interesting work but the flow diagrams are hard to read. Please make them readable for the public. Thank you.

Sign in to Reply



Bob Lacovara

12/8/2010 1:35 PM EST

This is more than just good advice for mission critical implementations. Many of the suggestions would apply equally well to software. Of course, viewed from the hardware description language perspective, the hardware is software...not that I wish to start a philosophic discussion on that score. But state machines and software systems both feature the ability to find themselves at execution points or states unexpectedly, and can sport unreachable states, and so on. I was bemused to find that one-hot machines were in use: I wasn't aware that much use had been made of one-hot architectures. Interested readers might want to go to abebooks and find a copy of Hill and Peterson's "Digital Logic and State Machine Design" (even the 2nd edition will do) wherein they will find a pedagogical HDL that compiles into a one-hot model. Another interesting thing about one-hot designs is that parallel execution is possible under some circumstances by allowing more than one flop to be hot at once. Suitable precautions and procedures are required, though.

Sign in to Reply



karax

12/9/2010 5:18 AM EST

Talking about simulation and verification tool SST tool from nebrija uses built-in simulator commands, that is, to feed the simulator with the appropriate commands to control the simulation or inject faults (PERL/TCL). This method is easy to implement and does not require the modification of circuit models. On the contrary, the command parsing and the reaction times of the simulator present a performance reduction and make that unviable for big circuits. I know other advanced techniques to speed up the simulation process using FLI (Modelsim) and other emulation HW methods in FPGA.

Sign in to Reply



anne-francoise.pele

7/16/2012 11:25 AM EDT

Dear Mario,

I have made the diagrams more readable. Click on the images to enlarge them.

Best,

Anne-Francoise Pele

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)