Xilinx recently introduced a state-of-the-art family of next-generation FPGAs with a host of new architectural features. In order to take full advantage of these new features, Synplicity was provided with early access to the new architecture, and – by the time the new devices were introduced to the market – engineers at Synplicity and Xilinx had been working side-by-side for almost a year to enhance Synplicity's Synplify Pro synthesis engine.
The world's first FPGAs to be fabricated at the 65 nm technology node, the Virtex-5 family from Xilinx provides 65% more logic cells and 25% more input/outputs (I/Os) as compared to the preceding Virtex-4 generation of devices. At the same time, members of the Virtex-5 family provide 30% higher performance, 35% lower dynamic power dissipation, and they consume 45% less silicon real estate as compared to their Virtex-4 counterparts.
Virtex-5 FPGAs boast a wide range of new architectural features, such as 6-input lookup tables (LUTS) and a new diagonal interconnect fabric (traditional architectures employ only 4-input LUTs and conventional orthogonal interconnect). Virtex-5 devices also feature high-speed, high-capacity, high-performance RAM, DSP, and clock management hard IP blocks tuned for 550 MHz operation. Additional hard IP – such as integrated FIFO support in the RAM blocks – helps to further reduce dynamic power consumption.
Increases in the complexity of the FPGA fabric demand corresponding increases in the sophistication of the synthesis algorithms. If the same algorithms used for a fabric based on 4-input LUTs were applied to a fabric with 6-input LUTs, for example, then synthesis runtimes could easily be orders of magnitude longer. This means that in order to take full advantage of the specialized architectural features found in the Virtex-5 family, synthesis algorithms have to be fine-tuned or – in many cases – completely re-crafted.
The Virtex-5 family includes devices with up to 330,000 logic cells, 10 megabytes of on-chip memory, 1,200 general-purpose input/outputs (I/Os), and a host of additional hardened intellectual property (IP) blocks. Future platforms will provide even greater densities and capabilities further expanding the reach of advanced FPGA architectures across a wide range of application domains. In order to address the demands of these extreme FPGA devices, Synplicity and Xilinx have formed a joint Ultra-High-Capacity Timing Closure task force. The purpose of this task force is for engineering teams from both companies to collaborate to define and implement new design flows that maximize design productivity and the quality of results for ultra-high-density designs implemented using these next-generation FPGAs.
Below we introduce some of the key features associated with the new Virtex-5 family of devices and discuss how Synplicity's Synplify Pro FPGA synthesis technology has been enhanced to take advantage of the capabilities provided by these new components.
6-Input lookup tables
The first FPGAs circa 1985 employed 3-input lookup tables (LUTs). Subsequent generations moved to 4-input LUTs, because these offered a more optimal balance with regard to logic utilization and minimizing the number of logic levels in the context of designs of that era.
However, there has been a fundamental shift in the nature of designs over recent years. Today's designs often feature wide data paths, especially in the case of digital signal processing (DSP) applications. Implementing these designs using 4-input LUTs can require many levels of logic, thereby impacting performance. In order to address this issue, the ExpressFabric employed by the Virtex-5 family features LUTs with six independent inputs (Fig 1). This can significantly reduce the number of logic levels and LUT area required to implement wide functions.
1. The Virtex-5 family has 6-input LUTs.
Each of these logical elements can be used as a true 6-input LUT or as two 5-input LUTs that share their five inputs. In addition to containing four of these 6-input LUTs, a Virtex-5 slice also includes faster flip-flops to speed pipelined designs and an improved carry chain architecture to speed arithmetic operations. Overall, the Virtex-5 family provides 65% more logic cells (up to 330,000 LCs) as compared to their Virtex-4 counterparts.
The huge capacity of Virtex-5 FPGAs – coupled with architectural features such as 6-input LUTs – mandates the use of specialized synthesis tools with customized algorithms. In order to address these issues, engineers at Synplicity and Xilinx have been working side-by-side for almost a year so as to ensure that the industry-leading Synplify Pro synthesis engine takes full advantage of the capabilities provided by the Virtex-5 family.
As one simple example, consider a wide data path function such as an 18-bit comparator. If implemented in a Virtex-4 device using 4-input LUTs, this would require 13 LUTs and 3 levels of logic (Fig 2a). By comparison, when implemented in a Virtex-5 using 6-input LUTs, the same function requires only 7 LUTS and two layers of logic (Fig 2b).
2. Wide functions require fewer LUTs and logic levels.
(Click this image to view a larger, more detailed version)
One should not underestimate the complexity associated with mapping logic into LUTs with more inputs. If not handled correctly, the combinatorial explosion associated with mapping to 6-input LUTs can cause memory utilization and runtime problems. If the algorithms used for a fabric based on 4-input LUTs were to be applied to a fabric with 6-input LUTs without significant modification, then synthesis runtimes can easily be orders of magnitude longer. Also, when attempting to find an optimal mapping, traditional algorithms run the risk of becoming trapped in local minima.
Furthermore, as opposed to being timing-driven, most conventional synthesis engines simply attempt to reduce the number of logic levels. This is problematic in the case of LUT architectures in which different input-to-output paths can have asymmetric delays. Also, the fact that these LUTs can be used in a 2 x 5-input configuration further increases the complexity of the mapping operations. A lot of research and development is required in order to use structures that share inputs but that represent different functions. In order to address these issues, the Synplify Pro software (which features a unique direct-mapping capability) has been equipped with a variety of sophisticated new heuristic algorithms that are tailored to minimize the number of cuts and to address these complex mapping and timing scenarios.