As FPGA device sizes have expanded over the last few years to exceed the one million gate mark, design methodologies have changed also. Design approaches, such as using latches and asynchronous clocking, worked fine for 10,000 and 100,000 gate designs, but they break down at the 1M-gate level. For these designs, using too many of these practices will create difficult-to-predict problems, and will make a design impossible to debug and get to market successfully.
Even if the use of these techniques resulted in a successful design for devices of this size, there are other reasons for considering design strategies carefully. These days, designs cost too much to use them for only one project. Every design must be thought of as intellectual property to be used for multiple purposes, and by multiple designers. If the design is too complicated, or difficult to interface to other designs, the initial design investment won't pay off downstream. Avoiding bad design practices will extend the life of a 1M gate design, or the key components that make up the system.
Furthermore, if the design does end up being wildly successful, it will need to move from an FPGA to and ASIC or other higher volume and/or lower cost technology. Better design methodologies can also address the issue of faster migration to higher volume technology.
This article will provide some techniques for avoiding the problems that arise when designing multi-million gate devices, preparing the designer for the inevitable next stage in this evolution to 10M gate devices.
The problem with latches
Latches should not be used unless absolutely necessary. In most cases, a flip-flop will work just as well. When synthesizing designs, be especially careful to avoid accidentally inferring a latch when one is not intended. The problem with latches centers around the transparency issue.
In the circuit shown in Figure 1, if Gate A and Gate B both go low, we might have an oscillator.
Figure 1 - Latch in transparency
Most EDA software tools have difficulty with latches. Static timing analyzers typically make assumptions about latch transparency. If one assumes the latch is transparent, then the tool may find a false timing path through the input data pin. If one assumes the latch is not transparent, then the tool may miss a critical path.
Due to the transparency issue, latches are difficult to test. For scan testing, they are often replaced by a latch-flip-flop compatible with the scan-test shift-register. Under these conditions, a flip-flop would actually be less expensive than a latch.
Latch inference -- IF statement
A common problem with synthesis is the unintentional inference of latches. This occurs in processes when a conditional statement doesn't specify all cases of an assignment. When this situation occurs, the synthesis tool must insert a latch to hold the old signal value when the unspecified condition(s) occur.
In Example 1, when sel is true, the output is assigned the input value. But when sel is false, then the original output value must be maintained, causing a latch to be inserted.
Figure 2 - Latch inferred by incomplete IF statement
Figure 2 shows the synthesized latch. Note that the gate of the latch is not connected to the system clock. By definition, this latch makes the circuit asynchronous and consumes extra gates. You should make sure that your synthesizer reports this common problem.
Latch inference avoided -- IF statement
The latch inference problem can be avoided by making sure all cases are specified, as shown in Example 2, where simply letting the output be '0' in the else clause changes the latch to an AND-gate, as shown in Figure 3.
Figure 3 - Latch inference avoided
Latch Inference -- CASE Statement
The CASE statement provides another method for unintentionally inferring latches, as shown in Example 3. Note that the case statement is missing several conditions.
This means that the original value of the output must be maintained, implying a latch, as shown in Figure 4. Again this circuit is asynchronous and inefficient.
Figure 4 - Latch inferred by incomplete CASE statement
Latch inference avoided -- CASE Statement
The latch can be avoided by specifying some value for the output in the default clause in the case statement, as shown in Example 4. If we don't care what that value is, then specifying "x" permits the synthesizer to come up with an optimal solution, as shown in Figure 5.
Figure 5 - Latch inference avoided
Most combinational feedback loops, such as the one in Figure 6, can be shown to imply a latch. Often, combinational feedback loops are created by synthesizers when the FPGA architecture does not support latches and latches were unintentionally implied by the RTL code.
Figure 6 - Combinational feedback
Combinational feedback loops are typically capable of latching data, yet they are more problematic than latches because they may have set-up and hold-time constraints that are difficult to determine. Like latches, they cause testability problems.
In many cases, combinational feedback loops can be replaced by flip-flops or latches, or completely eliminated by fully enumerating RTL conditionals.
Finite state machines
Finite state machines are normally designed in a synchronous fashion using binary or one-hot encoding styles. From a portability perspective, the most important FSM design issues are dead (lock-up) states, initialization for testability, and synchronizing the FSM inputs to the system clock.
Figure 7 shows an example state machine with three states (SO, S1, S2), three inputs (A, B, C), and a reset (RST). Note that with three states, it will require at least two flip-flops to represent the state. This implies that in some cases there will be an extra "dead" state combination that we will want to account for.
Figure 7 - FSM example
Binary encoded FSM
The binary encoded state machine is the most common. In Example 5, the states are assigned binary numeric values starting with zero and going up to the desired number of states. State assignment may either be done manually, as shown here, or automatically by the synthesis tool, which may result in a more optimal solution.
In Example 5, the assignment of state names to binary values is specified by the parameter statement. For Synopsys tools it is necessary to use the comments to identify the state register.
Within the FSM, the STATE_REG process manages the state register, whose input is NEXT_STATE and output is PREV_STATE. The STATE_NEXT process determines the NEXT_STATE value and R, G, Y outputs based on PREV_STATE input and A, B, C inputs. Note the "default" clause in Verilog that handles dead states.
Figure 8 - Binary encoded FSM schematic
Figure 8 shows the cleaned-up schematic of the binary encoded state machine. Note the next-state decode logic on the left and the output decode logic on the right. The decode logic typically generates glitches, which is a problem if the decoded signals drive direct action pins.
One-hot encoded FSM
Another method for avoiding glitching state decoders is to completely avoid decoding states. For machines with a small number of states, one-hot state machines are a very efficient approach. Essentially, there is one flip-flop for each state. Upon reset, all the flip-flops are set to zero, except for the initial state flip-flop, which is set to one. From then on, only one flip-flop is "hot" at a time. The hot flip-flop represents the state of the machine.
The one-hot state machine is similar to the binary encoded FSM, except that the assignment of state names to state register bit values is done in a way that only one bit is hot at a time, as shown in Example 6.
Note that there is no need to handle the "default" states because it's a one-hot FSM. However, a reset signal is required. With Synopsys tools, the full_case comment in Example 6 makes it clear that a default clause is not needed.
Figure 9 - One-hot encoded FSM schematic
Figure 9 shows a cleaned-up schematic of the one-hot state machine. This looks like a circular shift register with a preset to the "100" value. The logic to the left of each flip-flop determines if the flip-flop will be the selected state during the next cycle. Notice how simple the output state decoding is.
One-hot vs. binary encoding
One-hot state machines are efficient for state machines with a small number of states. Their outputs require no decoding and they are very fast. The only potential problem is that they do not suppress multiple-ones, unless a special recovery circuit is added.
While binary encoded state machines are the most commonly used, you should try both types and see which is smaller for your application.
Input synchronization (metastability)
The circuit shown in Figure 10 works very well for synchronizing input signals. This circuit offers a high degree of metastability protection and should be used on all asynchronous inputs. Metastability may occur when the data-input changes at the same time as the clock.
Figure 10 - Input synchronization schematic
In this case, the first flip-flop may capture an intermediate voltage level, often modeled as an "X" in logic simulation. This intermediate voltage level will eventually become a 0 or a 1, but it takes some time for the flip-flop to resolve it. This resolution time is usually several times longer than the clock-to-out time of the flip-flop, but less than the clock period.
By placing two flip-flops in series, one can be sure that the second flip-flop is always capturing stable data, even if the first one is metastable for a time after the rising edge of the clock.
If we were to add combinational logic between the two flip-flops, the time available for stabilization would be reduced accordingly. Effectively, this circuit creates an input data sampling strategy, which avoids metastability problems and safely brings data into a synchronous system.
Khaled I. Rabia is senior design engineer at Avnet Electronics Marketing.