United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 



Programmable Logic

High-Speed State Machine Design

Three methods can be used to describe state flows. While each has its own list of pros and cons, a composite method could provide the answer.

by Stephen L. Wasson


High-performance state machine design has two primary objectives: to ensure that the state machine performs the way the designer wants, and performs the way the designer wants.

Getting one's way means transcribing a complete and proper state flow model. Getting one's performance means the model's implementation will actually run at the speed desired. Of three contemporary ways to describe state flow­flowcharts, HDLs, and bubble diagrams­all focus on transcription performance, none on implementation performance. The following reviews some strengths and weaknesses of each method and then presents a composite methodology for high-performance state-machine design.

The designer's first objective in state-machine design is writing a complete and proper model. A complete model accounts for all the possible stimuli; a proper model rationalizes all the directed responses. State machine models typically suffer three kinds of errors: holes, conflicts, and being just plain wrong. Holes occur when no response is directed for a given stimulus condition. Conflicts occur when multiple responses are specified for the same stimulus condition. Although most of the current state-machine entry tools catch the conflicts, and most are getting more intelligent about assuming the unspecified-else conditions, few compel a complete accounting at entry of a proper state-flow model. Things that are wrong could have any number of root problems, and will require detective work.

The second objective in state-machine design is to generate an implementation that satisfies the required performance criteria and does so repeatedly. State machine performance is a function of both the next-state and the output functions. In both cases, in today's FPGAs, the speed of these functions is determined by the number of logic levels traversed. Although some tools are getting more intelligent about logic-level reduction, none adequately address repeat performance.

Compilable flowcharts Dating back to programming in the '60s, flowcharts provide an environment of old-world familiarity. They are easy to learn, and they present state flow in a modestly manageable format. Figure 1 depicts the state flow for the PCI Target IDLE state and its exit decisions. These schematic symbols are merely macro calls to lower-level primitives. Figure 2 shows the compiled gate equivalent: the state bit becomes a single flip-flop, and each decision becomes a pair of "yes/no" gates.

Figure 1. Shown is one possible flowchart representation for the PCI Target IDLE State.


However, as design demands increase, compilable flowchart advantages decrease. Note in Figure 2 that successive decisions incur successive delays­the nested-if problem. And similar to if-then-else code, the transitions are decision-order dependent which can result in impossible exit conditions. Without a post-processing tool to check for next-state holes, conflicts, or impossible exits, compilable flowcharts are the least viable state-machine implementation option.

Figure 2. After compilation, the flowchart in Figure 1 reduces to this circuit. Note that NS_BKOF is already three logic levels deep. This is also the kind of logic that if-then-else code produces.


HDLs Borrowed from programming (where the H stands for hardware, not high-level), HDLs are time-tested state-machine modeling mechanisms. Such text-based tools­including equation entry tools­enjoy the advantages of terse entry and rapid synthesis. Listing 1 shows the PCI Target IDLE and BUSY state equations (more compactly and accurately than Figure 1). Overall state flow is obscure, however. With simple equation entry tools, the more equations there are, the more obscure the state flow becomes; with HDLs, more code means more obscure logic levels.

Listing 1 PCI Target equations


from IDLE 1 goto IDLE if FRAME# 2 goto BUSY if !FRAME# * !Hit 3 goto DATA if !FRAME# * Hit *(Term + Term * Ready) * (FREE + LOCKED * LOCK#) 4 goto BKOF if !FRAME# * Hit *(Term * !Ready + LOCKED * !LOCK#) from BUSY 5 goto BUSY if (!FRAME# + IRDY#) * !Hit 6 goto IDLE if FRAME# 7 goto DATA if (!FRAME# + !IRDY#) * Hit *(!Term + Term * Ready) * (FREE + LOCKED * L_lock#) 8 goto BKOF if (!FRAME# + !IRDY#) * Hit *( Term * !Ready + LOCKED * !L_lock#)


Listing 1. These equations are for PCI target IDLE state and BUSY state exit transitions. (Ref: PCI Local Bus Specification Revision 2.0, pp. 174-175.)


Of course, respectable HDL tools will find holes, conflicts, or impossible combinations; and, if one's equations are correct, good tools can correctly assume the unspecified-else conditions. Still, some synthesis tools tend to optimize their HDL modules with immutable blocks and hard boundaries; that is, with blocks that may not integrate optimally with other modules. Such "hard" modules are frequently a source of performance degradation in FPGA implementations. Moreover, with HDLs, provisions for complete up-front flow verification and proper logic-level control are less than optimal for high-performance implementations.

Bubble diagrams With the benefits of both flowcharts and equation entry, bubble diagrams depict both state flow and functionality compactly and quickly. Figure 3 illustrates the whole flow of the PCI Target state machine (where numbered transition arrows correspond to the numbered lines in Listing 1). While it is true that, on paper, transition and output equations may clutter up the drawing, this detraction is more manageable with up-and-coming state-machine generators like the FSM tool from Escalade (Sunnyvale, CA), or StateCAD from Data I/O (Redmond, WA). Bubble diagram tools alone, however, still reliant on equations, unspecified-else assumptions, and logic synthesis, neither compel rigorous entry analysis nor control logic level reduction.



Figure 3. The numbered transitions correspond to the numbered line equations in Listing 1.


A composite methodology Rapid entry has been the primary focus of state machine tools thus far. Preliminary verification has consisted mostly of conflict resolution, and pre-implementation logic-level reduction has been largely unaddressed.

There are four key steps in outlining a composite methodology for entry and verification of state flow and functionality with logic-level reduction prior to synthesis or simulation:

  1. ) Enter state flow using bubble diagrams.

  2. ) Specify next-state transitions using equations.

  3. ) Analyze next-state equations using Karnaugh maps.

  4. ) Optimize next-state logic using common sense.
A brief history of state machine evolution

In the beginning of solid-state time, before SSI, before operating systems, when hand-toggled front-panel switches loaded bootstrap code from paper tape into core memory, when "RTL" meant "resistor-transistor-logic," there were timing chains. These discrete transistor flip-flops serially connected in simple sequencing shift registers were the beginnings of sentient state-machine life. These simple state-machine forms ruled the primordial silicon seas controlling everything from basic I/O functions to higher-level CPU operations. They lived long and prospered...until MSI and the '60s.

With the birth of PROMs, the single-celled flip-flop chains evolved into some of the earliest encoded state-machine structures. Instead of one flip-flop-per-state, these new recombinant structures expressed their next-state sequences in plentiful PROM code. Such highly encoded structures spared precious flip-flops and quickly reproduced throughout the computing world. They, too, lived long and prospered... until PALs and the '70s.

In the ever popular PAL era, there was a small explosion in the diversity of programmable logic forms, and PALs, PLAs, and other TLAs became the predominant state-machine entities. Next-state sequences and state flows of ever increasing complexity could be encoded into a single level of logic and a handful of flip-flops. These new competitors were fast, and encoded state-machines flourished... until FPGAs and the '80s.

With the emergence of register-rich devices, the technology terrain shifted the balance of control power away from encoded back to state-per-bit state machines. This "new" strategy was especially well adapted to the new environment of plentiful flip-flops and diminished next-state function generators. The old high fan-in designs inherited from PALs would not survive the new logic-resource constraints of FPGAs. And so it is in this new era of FLAs (Four-Letter Acronyms) and BFGAs (Big Field-programmable Gate Arrays), that state-per-bit is likely to be the dominant state-machine form into the next century.

Now, while the above account may not satisfy creationists or historians, it should illustrate some point, like, "design techniques change with the technological climate," or "the old ways of yesteryear may be recycled to become the new ways of today." Specifically, state-per-bit­pioneered by Huffman with relays in the '50s­has returned as the state-machine encoding choice for FPGAs in the '90s.

One final note: There seems to be some confusion regarding the interchangeability of the terms "state-per-bit" and "one-hot." State-per-bit refers to how many register elements are required to represent a state; one-hot refers to how many register elements are active during a process. This distinction is more apparent when regarding a hierarchical state machine which may be state-per-bit but not one-hot.



Next-state analysis After depicting all state flow and specifying all next-state transitions, one can make a complete analysis of next-state equations by simple application of Karnaugh maps. Filling in the maps will immediately and unambiguously resolve all transition holes and conflicts (without unspecified-else assumptions). Although Karnaugh maps are most manageable for five or fewer variables, simple reduction and factoring techniques make them effective in complex equations of six or more variables. This apparent limitation is even a perfect match with fine-grained FPGA architectures in which the function generators have five or fewer inputs.

For example, assuming the state flow of bubble diagram Figure 3 with the transition equations given in Listing 1, the next-state analysis for state IDLE may be represented by the hierarchical Karnaugh map in Figure 4. Since the transition equations consist of seven variables, it is convenient (and easy) to factor out the two dominant variables, FRAME# and Hit. These two variables are mapped into the primary map such that three possibilities simply point to next-state IDLE or BUSY , and the fourth possibility­at !FRAME# * Hit ­hierarchically points to the secondary map. Then, given that FREE and LOCKED are dependent variables­ LOCKED = !FREE ­a secondary map of four variables completes the full and proper specification of the remaining transitions from state IDLE.

Figure 4. This Karnaugh map shows the next-state transitions from the PCI Target IDLE State for the equations given in Listing 1 and the state flow depicted in Figure 3.


Transitions from state BUSY do not seem to be as well specified. In Figure 5, the dominant variables are again factored out: FRAME#, IRDY# , and Hit comprise the primary map. But this time, ambiguities are immediately evident in the two squares mapped to the term FRAME# * !IRDY . For FRAME# * !IRDY * !Hit , simultaneous transitions are specified both to IDLE and to BUSY ; for FRAME# * !IRDY * Hit , simultaneous transitions are specified both to IDLE and to the secondary map. However, these ambiguities can be resolved if, in Listing 1, line six, after " goto IDLE ," the " if " statement is amended to read FRAME# * IRDY# .



Figure 5. Next-state assignments mapped at FRAME# * !IRDY are in conflict illustrating the incompletely specified equation line 6 of Listing 1.


Next-state optimization The primary goal of next-state optimization is logic-level reduction; a secondary goal in FPGAs is to map in flexibility. Current implementation tools obscure these efforts. Whereas state flow is best described with the " from-goto " construct, next-state optimization is best facilitated with the opposite " goto-from " construct. Note the " from-goto " bias of all the aforementioned tools. Only the bubble diagram is readily interpretable in either point of view. One can rearrange equation descriptions easily enough to accommodate next-state optimization, but flowcharts are of little value in this analysis.

The initial step in next-state optimization consists of gathering up all the next-state terms. This is equivalent to accounting for all the transitions into a state bubble. For example, in contrast to Listing 1, next-state IDLE entry equations would be gathered up as follows:

goto IDLE

from IDLE if...

from BUSY if...

from TNAR if...

Then, in the absence of a sophisticated optimization tool with constraint hooks for such factors as signal margins (how much of the clock period has already been consumed by the time a signal arrives at a function) or expansion terms (how many more inputs may be added to a particular term), apply common sense.

In an FPGA, applying common sense may mean duplicating terms if that contributes to better mapping and hence to logic-level reduction. Common sense may also suggest one rearrange equation terms to balance input signal margins by moving "late" arrivals forward (toward the clocked element input) and by moving "early" arrivals back. For example, in a "slow" PCI Target, the Hit term may be registered, and therefore may be moved to the rear of most next-state logic chains; but, in "medium" PCI Target implementations, Hit is likely to be a late combinatorial signal and therefore should be moved forward. Common sense also dictates that optimizing for an FPGA means matching equation term sizes to function generator widths, with considerations for future expansions. Attention to such detail promotes predictable, repeat, high performance.

Output function optimization Neither Mealy nor Moore outputs are optimal for high-performance, synchronous FPGA designs. Instead, registered outputs derived from next-state inputs yield higher performance. Logic-level reduction issues and techniques are similar to those discussed for next-state optimizations with the additional note that it will benefit output function performance to duplicate the entire next-state logic cloud (see "Design Notes").

FPGA-specific design notes

State-per-Bit ­ In flip-flop-rich FPGA architectures, using state-per-bit encoding (one flip-flop-per-state) can be more efficient than using encoded state machines. Encoded state-machine next-state functions are wider and deeper and therefore consume more function generators and route resource. State-per-bit encoding­trading function generators for flip-flops­has narrower and shallower next-state functions which therefore route more readily with higher performance. These same logic-level reduction advantages also increase the performance of the output functions­especially when using registered outputs.

Registered Outputs ­ In FPGAs, registered next-state (RNS) output functions are usually better performers than either Mealy or Moore machines. Moore outputs­combinatorial functions of encoded state variables (only)­unnecessarily constrain state encoding to the output functions. Mealy outputs­combinatorial functions of both state and input variables­suffer from the logic-level delay initially incurred before they begin the route to their destinations. Instead, generate outputs by registering the appropriate next-state transition terms. Better yet, register a duplicate set of the next-state terms so that logic-level reduction can be independently optimized for both next-state functions and the output functions.

Power-Up Conditions ­ Whereas sequential elements of older PLDs traditionally power up unknown, some of today's FPGAs power up with flip-flops in guaranteed known states. In such devices as the Xilinx 3K, it is sufficient to specify the initial state of every autonomous state machine as low-active so that such states power up active by definition. This obviates the need for the usual asynchronous reset to every flip-flop; an otherwise route-wasteful and performance degrading practice. (And one which most synthesis tools assume.) Also in such devices, each time the device is globally reset, one can rest assured that one's state machines will always return to their power-up states.

Asynchronous Reset ­ Don't frivolously reset every state bit for the "just in case" case. Route intensive asynchronous resets wreak havoc on synchronous functions; and they complicate timing analysis. Be aware that if one specifies a return-to-idle from every state, the next-state-idle logic will be wide, deep, and slow. If possible (see "Power Up Conditions"), one should rely on the device's master reset to recover from catastrophic conditions. For lesser catastrophes, consider using synchronous resets.

Asynchronous Inputs ­ The most conservative thing one can do with asynchronous inputs is to register them before distributing them. However, if the signal latency is intolerable, one can­if careful­directly input asynchronous signals to the state machine with the following proviso: only input asynchronous signals to one destination per clock edge. This requires a variation on the "one hot" theme such that adjacent state bits will overlap for one clock period: from state x, asynchronous input v enables synchronous transition to state y; one clock later, state y active causes the synchronous deassertion of state x. This synchronous "two-hot" mechanism can be repeatedly employed even with the same signal.

Unknown States ­ How does this happen? 1) The next-state transitions were not exhaustively specified, so they must be specified. 2) There was a synchronous setup violation which static timing analysis should reveal, so that must be corrected. 3) An asynchronous input was simultaneously used in multiple next-state functions; this should not be done. 4) It is running at speeds beyond the device's toggle rate and inside the metastability recovery window; this should not be done. 5) Alpha particle collision has flipped a bit against astronomical odds; I hate when that happens. 6) The device is on fire; oh, well...

Hierarchical State Machines ­ Like code, any procedure that gets too large should be broken down into smaller, more manageable modules. Large state machines should be written as a single executive module that makes subroutine calls. State-per-bit machines can easily be adapted to this hierarchical structure by defining a subroutine as a separate state machine in which the entry state is simultaneously set by a holding state in the executive. This "multi-hot" condition exists until the subroutine return condition is satisfied and releases the executive's holding state. What's more, like code, these subroutines can easily be called repeatedly from anywhere within the executive.



Future FSM generations Of course, all of these fine-grained manipulations should be done automatically, and someday they will be. Bubble diagram tools are an excellent start, but they need to be expanded to provide hierarchical modeling to facilitate more complex, hierarchical state machines and to accommodate more rigorous next-state analysis. Designers should be able to push through bubble trees down to leaf nodes that contain interactive Karnaugh maps. These maps should be automatically filled in from the initial next-state equation specifications, and all holes, conflicts, and else-assumptions should be clearly indicated. At this level, the designer should be able to perform map manipulations that automatically back-annotate the next-state equations.

Both next-state and output equations should also have some graphical representation which quickly illustrates their equation-term hierarchy and thereby their logic-level costs. This representation should also be manipulable to exploit term associative properties­parentheses re-arrangements­to facilitate early and late signal margins and to accommodate future term expansions without impacting logic-levels­all matched to FPGA function generator granularity. Output functions should be able to replicate easily next-state terms for independent logic-level reduction.

Stephen L. Wasson is a principal of HighGate Design Inc. (Saratoga, CA), consultants specializing in FPGA implementations.

To voice an opinion on this or any Integrated System Design article, please e-mail your message to: michael@asic.com.


integrated system design  July 1995



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]


For more information about isdmag.com e-mail cam@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome.
Copyright © 1996 - Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About