The techniques presented so far detect or prevent
an incorrect change from one legal state to another legal state.
Depending upon the end application, this could result in anything from a
momentary system error to the complete loss of the mission.
for detecting and fixing incorrect legal transitions are triple-modular
redundancy and Hamming encoding. The latter provides a Hamming distance
of three and covers all possible adjacent states. A simpler technique
for preventing legal transitions is the use of Hamming encoding with a
Hamming distance of two (rather than three) between the states, and not
covering the adjacent states. This, however, will increase the number of
registers your design will require.
for its part, involves implementing three instantiations of the state
machine with majority voting upon the outputs and the next state. This
is the simpler of the two approaches and many engineers have used it in a
number of applications over the years. Typically, a TMR implementation
will require spatial separation of the logic within the FPGA to ensure
that an SEU does not corrupt more than one of the three instantiations.
It is also important to remove any registers from the voter logic, since
they can create a single-point failure in which an SEU could affect all
The use of a state machine with encoding that
provides a Hamming distance of three between states will ensure both SEU
detection and correction. This guarantees that more than a single bit
will change between states, meaning an SEU cannot make the state
transition from one legal state to another erroneously. The use of a
Hamming distance of two between states is similar to the implementation
for the sequential machine, where the unused states are addressed by the
“when others” clause or reset cycling. However, as the states are
explicitly declared to be separate from each other by a Hamming distance
of two within the RTL, the state machine cannot move from one legal
state to another erroneously and will instead go to its idle state
should the machine enter an illegal state. This provides a more robust
implementation than the binary one mentioned above.
If you wish
to implement a state machine that will continue to function correctly
should an SEU corrupt the current state register, you can do so by using
a Hamming-code implementation with a Hamming distance of three and
ensuring its adjacent states are also addressed within the state
machine. Adjacent states are those which are one bit different from the
state register and hence achievable should an SEU occur. This use of
states adjacent to the valid state to correct for the error will result
in N*(M+1) states, where N is the number of states and M is the number
of bits within the state register. It’s possible to make a small
high-reliability state machine using this technique, but crafting a
large one can be so complicated as to be prohibitive. The extra logic
footprint associated with this approach could also result in lower
Deadlock and other issues
There are other
issues to consider when designing a high-reliability state machine
beyond the state encoding schemes. Deadlock can occur when the state
machine enters a state from which it is never able to leave. An example
would be one state machine awaiting an input from a second state machine
that has entered an illegal state and hence been reset to idle without
generating the needed signal. To avoid deadlock, it is therefore good
practice to provide timeout counters on critical state machines.
Wherever possible, these counters should not be included inside the
state machine but placed within a separate process that outputs a pulse
when it reaches its terminal count. Be sure to write these counters in
such a way as to make them reliable.
When checking that a counter
has reached its terminal count, it is preferable to use the
greater-than-or-equal-to operator, as opposed to just the equal-to
operator. This is to prevent an SEU from occurring near the terminal
count and hence no output being generated. You should declare integer
counters to a power of two and, if you are using VHDL, they should also
be modulo the power of two to ensure in simulation that they will wrap
around as they will in the final implementation [count <= (count + 1)
Mod 16; for a 0- to 15-bit integer counter]. Unsigned counters do not
require this, since there is no simulation mismatch between RTL and
post-route simulation regarding wraparound.
You can replicate
data paths within the design and compare outputs on a cycle-by-cycle
basis to detect whether one of them has been subjected to an SEU event.
Wherever possible, edge-detect signals to enable the design to cope with
inputs that are stuck high or stuck low. You should analyze each module
within the design at design time to determine how it will react to
stuck-high or stuck-low inputs. This will ensure it is possible to
detect these errors and that they cannot have an adverse effect upon the
function of the module.