These days, a large portion of ASIC, SoC, and ASSP designs are at least partially prototyped in one or more FPGAs. This amounts to many thousands of prototyping projects per year. Compared with other ASIC verification methods, however, FPGA Prototyping is mistakenly seen by many as an ad-hoc mix of tools that must be cobbled together by hand. In reality, powerful integrated tools, platforms, and expertise exist that greatly improve the productivity of FPGA-based ASIC prototyping. This article highlights recent tool advances that can help you set-up, implement, and verify your prototype faster than ever before.
Use the right FPGAs
ASIC designs are generally larger – and often faster – than FPGA Devices, and they tend to push the envelopes of FPGA performance and density. Thus, we will almost always be using the largest FPGAs (in the fastest speed grade) available. Anything less might seem to save money, but in the end could make timing closure harder to reach or make the design harder to fit (or both).
Less obvious is the fact that we should also use the device in a package with the most available I/O pins and with the most flexible support for clocking, voltage, and I/O standards. The I/O often becomes the most valuable resource on the devices, especially when designs are partitioned across multiple FPGAs. Synplicity builds HAPS (High-Performance ASIC Prototyping System) boards using the largest Xilinx devices – namely the Virtex-5 LXC5V330 in the -2 speed grade and the 1760 ball grid array package.
If a design takes up more than 70% to 80% of the largest FPGA available to you, then it is worth considering partitioning it across more than one device. Some designs have natural and obvious partitions based on existing hierarchical boundaries. The aim is to ensure that we do not create new critical paths in the design that cross between FPGAs on the board.
In order to address this, Synplicity's Certify tool has numerous partitioning aids from low level block zippering up to completely automatic full-RTL partitioning. These allow as much or as little manipulation of the design as required to meet the performance goals, but all involve no changes at all to the original ASIC RTL.
An important manipulation of the design that Certify can perform without changing the RTL is to automatically add I/O pin multiplexing between FPGAs. This uses time-sharing of the wires between the FPGAs so that they carry two or more signals, thereby alleviating potential I/O bottlenecks which often arise at the partition boundaries.
Many of Certify's partitioning aids have been developed to meet the needs of users performing hundreds of partitioning projects within major Semiconductor and System labs since 1999.
ASIC designs often contain design features that are FPGA-Hostile. For example, ASIC designs are typically sprinkled with instantiations of elements or macros from ASIC technology libraries or macro generators. This leaves "black-box" holes in the RTL for which some functionality must be described in order to complete the FPGA implementation. Some of this functionality is provided automatically by Certify, which can extract the required RTL from the ASIC library itself. Synopsys DesignWare instantiations are dealt with automatically in a similar way.
Another FPGA-hostile facet associated with many ASIC designs is their complex clocking structures. Multiple clock domains, asynchronous parallel channels, and gated clocking trees will quickly overflow the global synchronous clock resources of even the largest FPGAs. Certify will automatically simplify gated or generated clock networks back to a common system clock and build the required enable signal to ensure equivalent functional behavior. An example of this is shown in Fig 1. The result is to match to the resources available inside the FPGA.
1. Automatic gated clock conversion.
Implementation and verification
Fast design iterations
Implementation is a critical phase in the FPGA prototyping flow. The partitioned design may undergo many iterations as bugs are discovered and fixed, or design blocks are tweaked and re-tweaked for higher performance.
It is very important to keep this iteration loop as short as possible. However, the combination of leading-edge ASIC designs in the multi-million system gate range, and stretch performance goals to model real-world operation, can lead to lengthy synthesis and place-and-route passes. The great advantage that FPGAs offer for debugging and design exploration begins to diminish when using traditional FPGA flows and large design sizes. The answer is to use incremental implementation methods.
Xilinx's ISE 9.2i design software offers a technology called SmartCompile which is ideal for the ASIC prototyping flow. Designed to speed up the implementation flow by 2-to-6 times versus traditional flows, SmartCompile is comprised of three components: SmartPreview, SmartGuide, and Partitions (not to be confused with multi-FPGA Partitioning).
SmartPreview allows you to halt the ISE tool flow in mid-stream to see how a particular implementation pass is proceeding. While halted, you can check key implementation information like the number of timing violations, the timing score, or the number of constraints met so far. You can even save the intermediate design and timing reports and create a bitstream for lab debug. If the implementation is proceeding as expected you can resume the pass; or you cancel a run that is not proceeding as planned, thereby saving valuable design time.
SmartGuide delivers automated incremental design to the FPGA design flow. SmartGuide can speed up the implementation phase by 2-to-6 times depending on design size and hierarchy setup.
With SmartGuide turned on, your first full implementation run is "guided", or marked for component and route placement. Let's say after a debug session you decide to make a design tweak and change one HDL source. As you re-enter the implementation flow, SmartGuide examines the hierarchy and identifies where the design needs to be re-implemented. Where possible, SmartGuide will reuse the placement and routing that didn't change from the prior implementation pass, thereby speeding up the re-implementation flow (sometimes dramatically).
Synplicity and Xilinx have collaborated closely, as part of our Timing Closure Task Force, to enforce name consistency in both tools. This means that names remain constant from run to run of the Synthesis and Place-and-Route, thus ensuring best possible guided flow results.
Some incremental tool flows can produce worse timing paths from having to route around "locked" modules, but SmartGuide has the ability to identify critical timing paths and – if necessary – free up portions of an otherwise unchanged module for re-implementation, emphasizing critical paths and keeping timing a priority.
The third component of SmartCompile is Partitions, which offers the ability to completely lock down a completed module's placement and routing. In this way, a debugged "known-good" module or piece of purchased IP can be implemented and then set aside while you concentrate on debugging your other modules, while still enjoying the benefits of an incremental implementation flow. Partitions can be locked down early in the tool flow by using Compile Point Technology within the Synplify Pro and Synplify Premier synthesis tools. The partition information is automatically passed on to ISE.
All the components of SmartCompile work directly with either Xilinx or Synplicity synthesis and can cut the implementation flow for large designs by between 2 and 6 times. SmartCompile delivers more time for critical module debug, thereby freeing the engineer from watching lengthy and cryptic synthesis and place-and-route runs.