| |
System Design
HDL-Based FPGA Development on a Budget
The proper choice of tools and methods puts hardware description language with in nearly everyone's reach.
by Leo Bredehoft
In
these days of expense containment, smaller firms are unable to justify tens of thousands of dollars for FPGA development tools. Managers and engineers at such companies might conclude that without $100,000-plus FPGA HDL (hardware description language) development environments, using an HDL in FPGA design is prohibitively error-prone and labor-intensive.
This is not the case. With a modest tools budget, the designer can build a completely automated development environment with many
efficiencies of more expensive tool sets. It is possible to successfully execute an FPGA design, meeting schedule and performance targets without a huge CAD investment.
Logic synthesis
HDL logic synthesis has a place in most FPGA designs because of its cost savings. HDL designs can be more compact and understandable, and therefore less costly to maintain, modify, debug, and reuse.
HDL logic synthesis permits the engineer to trade gate-level development time for
synthesis-tool-generated output which may be slightly slower or slightly larger than gate-level circuits.
An FPGA design may be turned over to a synthesis tool almost entirely, and schematic-level design need be done only in those cases where synthesis output is not sufficient for performance or size reasons. Often, such cases constitute only a tiny part of the overall design.
Partitioning
The FPGA development can be viewed as an eight-step process. (See Figure
1.) In the first step, the user partitions his design into a hierarchy of modules. There are two considerations that affect the partitioning process: elimination of duplication, and portability.
Elimination of duplication leads to the smallest possible HDL model for a design. For instance, if a design contains four identical registered computational elements, or it may be partitioned to contain these elements without a loss of area or speed, the engineer should write a single model for the
elements, then instantiate the model four times, rather than copying the source text for the elements four times and incurring extra maintenance and compilation time penalties.
To ensure portability, partitioning should be done so that vendor-dependent structures, such as I/O pads, are localized and not spread throughout the source files, where they may be difficult to maintain when the design is being ported to a new vendor's part. I/O pads may be specially handled by having a single
source file that invokes all of the internal modules and acts simply as a netlist to connect them to pad macros which are invoked using some gate-level construct, such as structural VHDL[1,2].
Coding
After the partitioning, the engineer codes each module, assigning one source file per module. If a source file is more than 500 lines, the engineer should consider repartitioning it. Large source files make incremental changes painful.
During coding, the engineer
should isolate vendor dependencies from logic processed by the synthesis tool. Vendor dependencies should be contained so that alternative architectures may be selected easily at compile time, without requiring source code to be modified each time a different vendor is selected.
Figure 1. FPGA development can be viewed as an eight-step process: partitioning, coding,
synthesis, merging, flattening, place and route, back-annotation, and simulation.
Compilation
In a partitioned design, each module is synthesized individually. The synthesis tool produces a set of Boolean expressions for combinatorial logic in the design, then maps the expressions to the target technology. It makes optimal use of the technology by sharing Boolean subexpressions where possible.
Grouping logic in the same source file makes it
visible to the synthesis tool simultaneously at compile time, so that it can generate more efficient logic by sharing Boolean subexpressions. Therefore, logic elements in the same source file should be grouped only if they share inputs or intermediate expressions.
If a large number of totally unrelated logic elements is in the source file, the synthesis tool will waste time searching for non-existent commonality. This can degrade productivity as the design is being revised.
There are circuits that will simply not synthesize well. These circuits usually have a high degree of internal coupling. A good example is a wide adder.
To skirt the problem, some synthesis vendors invoke macros hand-designed by the FPGA vendors. This solves the problem for arithmetic logic and some other applications, but there is a chance that a design will contain circuits that will lead to grossly inefficient output from the synthesis tool.
Another solution
is to structure the logic block in question so that the synthesis tool generates intermediate nodes as outputs. These outputs are then routed back into the module externally under another name and fed into the remainder of the circuit. Figures 2 and 3 show examples of this.
It is worthwhile to peruse a report that details the depth of the various combinatorial circuits which the synthesizer has generated. Synthesis tool vendors may provide their own logic hierarchy reports, or the user
may write his own report generator.
Figure 2. This circuit generates one of eight arbitrary 5-bit constants, then compares it with a conditionally-incremented 5-bit input. This combinatorial circuit is sufficiently coupled internally that the synthesis tool may generate inefficient gate-level output.
Merging
Once all modules
are synthesized into netlists in the target technology, the design must be merged or flattened into a single netlist suitable for place and route. This process uses tools supplied by the FPGA vendor.
At this stage, the engineer must ensure proper signal buffering. The synthesis tool has signal loading information only for signals internal to each module being compiled. It cannot split an output into buffered copies to meet higher-level loading requirements when modules are merged into the
logic design.
Some synthesis tools can read in an entire merged design and adjust signal buffering without resynthesizing the logic. That step may be necessary to eliminate fanout violations; however, unless the synthesis tool permits per-signal control of loading, this kind of rebuffering may not be enough to reach the desired performance.
One particular synthesis tool permits the user to globally specify a maximum number of loads for all signals in a design. This
ability can set the maximum amount of loads, but there are noncritical paths in a design which should have more loading to prevent a waste of resources. Setting the global load too low could cause more buffering than needed.
Generating extra copies of a flip-flop using gate-level constructs in the source file can solve the problem in a critical path. Listing 1 shows a VHDL source example where rebuffering of a signal has been done.
Figure 3. The modified version of the circuit routes the intermediate constant to module output pins, then routes it back into the module to the comparison stage. Thus, the synthesis tool considers the subcircuits individually and may produce a more efficient output.
Design verification vs. test vector generation
|
|
It is important to understand that design verification and test vector generation are not the same thing. Test vectors are usually tuned to exercise the actual gate implementation of a design. Test vectors are often generated in an arbitrary fashion, without considering whether they follow the intended operation of the design. They may look nothing at all like vectors that might be captured from a running design.
Design verification, on the other hand, attempts to
operationally exercise a design in the most hostile, asynchronous environment possible. In this case, the logic of the design, not the gates, is being tested. Design verification attempts to simulate as many race conditions as possible in a design. It does not attempt necessarily to verify the integrity of parallel structures.
Errors are most likely in irregular, hand-designed logic found in control logic. Here, design verification should test all of the possible logical branches that the control
logic can take. Generating repeated asynchronous stimuli in design verification detects many unexpected bugs.
Test vectors exercise the actual gate implementation of a design. Time-consuming, error-prone hand coding of test vectors makes design verification onerous. Modeling the environment of the design programatically using an HDL provides a more straightforward and robust design verification.
Using an HDL, model memories initialize themselves from a text file and
report all of their accesses to an output file or the test terminal. Other models can change behavior based on a globally connected test number signal. An oscillator model can run at double speed, or a FIFO can change its depth for different test cases. Most common chip-level functions may be modeled with a knowledge of the modeling language. Learning the language produces a productivity improvement and boosts the quality of the results.
Timing analysis
Timing analysis should be
performed before each revision of a design is loaded into an FPGA for testing. Because timing variability affects the choices the place and route software makes, using a static timing analyzer program should verify the timing of the design. Most FPGA vendors provide a static timing analysis tool with their development packages.
With the analysis an engineer can determine critical performance-setting parameters: internal logic clock period limit, setup time for flip flops driven by
external signals, and clock to output times for external signals. These parameters set speed limits for the design. Not knowing them, an the engineer can determine neither a design's performance, nor whether it will even work at temperature and voltage extremes.
Static timing analysis has some problems. Noncritical, slow paths can interfere with the paths of interest, and produces an invalid worst-case result (see figure below).
Analysis tools generally provide a blocking
function to handle such problems. The user specifies a set of nodes to be excluded from a particular set of worst-case calculations. This function solves the problem, albeit after several iterations for the tool to reveal all unexpected cases.
Timing simulation is often considered by engineers to be a valid verification of a design's timing. This is not generally true. Unless a simulation that exercises all possible operation delay paths in a design has been run, some of the design remains
unverified. Designing such a simulation, even using test vectors, can be a daunting challenge. The synthesis tool can radically alter its output when a change is made to the design, even if the change is a trivial one, thus changing the number and kind of paths to be tested. The tool that affords the most timing verification effect for the least effort is certainly the static timing analysis tool.
|
Invalid latch timing path
|
A static timing analysis problem. The path from the 2-to-1 multiplexer D1 input to the Y output is the normal operational path and the one to be selected for timing analysis. A second path from microprocessor data through the latch and multiplexer is a non-critical path. The latch enable timing is clocked by the same clock as the multiplexer select; however, the latch does not need to propagate the microprocessor
data in the same clock cycle that the latch output is selected by the multiplexer. The analysis tool is, of course, unaware of this timing condition, and it will recognize the microprocessor to Y path as the slowest, even though it in fact isn't.
|
Place and route, back-annotation
After the design is merged, it must be placed and routed in an FPGA. This can take from minutes to hours, depending on the vendor and part complexity. Place and route is
followed by back-annotation, where propagation, setup, and hold delays dependent on place and route are inserted into the netlist.
If the engineer is not planning on loading a particular synthesis run into an FPGA, he generally does not need to do place and route. Most FPGA vendors' tools permit back-annotation of unrouted designs by setting all delays in the design to zero. There are few problems with back-annotating unrouted designs, except for skew problems in clock distribution.
Clock distribution trees consisting of multiple levels of buffers with zero delay connected to flip flops with zero clock to output delay can cause latching of improper values in simulation. To prevent this, it may be necessary to conditionally compile out clock distribution trees for unrouted simulation.
Simulation
Figure 1 shows two
possible points in the design flow for simulation: model level and gate level. An HDL permits direct simulation of a model before synthesis. A VHDL model source can be simulated as long as any embedded gate-level constructs, such as tri-state buffers and I/O pads, have models written for them and are compiled into a simulation library.
Because model simulation uses zero delay for all components, it provides no timing insight. It is, however, faster for some simulators that do not process
timing information to simulate directly from the models. VHDL simulators process timing information. VHDL carries timing information for each signal, even if the information consists of all zero delays. Thus, the use of a VHDL simulator to simulate from models may provide no speed advantage over timing-annotated gate-level simulation.
Gate-level simulation can incorporate back-annotated timing information to provide extra confidence in the design's timing characteristics. It points out
worst-case delays and helps confirm the validity of the synthesis output. If the engineer sees no speed advantage from model over gate-level simulation, he should only use the latter.
Both model and gate-level simulation require model library support. Gate-level simulation requires models for all primitives invoked by the synthesis tool. Model simulation requires models for all gate-level primitives invoked directly from the model. To maintain a gate-level model library, scan through the
initial synthesis output netlist for all gates used, write models for those gates, then add gates as successive synthesis runs require them.
Some FPGA vendors are beginning to show an interest in providing simulation library source code for a modest fee. The engineer should investigate that alternative before coding libraries from scratch.
To conclude in the next issue.
Leo Bredehoft is a software/hardware design engineer with Netrix in Longmont,
Colo. He holds a BSEE from Wichita State University. His interests include computing machine architecture and high-level design representation tools.
To voice an opinion on this or
any
Integrated System Design
article, please e-mail your
message to:
michael@asic.com.
integrated system design June 1995
[
Articles from Integrated System
Design Magazine
] [
ICs and uPs
]
[
Custom ICs and Programmable Logic
] [
Vendor Guide
]
[
Design and Development Tools
] [
Home
]
For more information about isdmag.com e-mail
cam@isdmag.com
For advertising information e-mail
amstjohn@mfi.com
Comments on our
editorial are welcome.
Copyright © 1996 -
Integrated System Design
Magazine
|
|
SEARCH JOBS
SPONSOR
RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.
For more great jobs, career related news, features and services, please visit EETimes' Career Center.


|
|
|
|