Editor's Note: You would not believe the number of emails I receive asking which FPGA is "the best". Of course you have to define your metrics here: "best at what?" is the question that immediately comes to mind. Are we talking about the number and size of LUTs, multipliers, on-chip RAMs, etc., or performance (raw speed), or power consumption or... the list goes on.
Another consideration is how well does the underlying FPGA fabric (logic and interconnect) accommodate different flavors of designs. For example, you have to question the point of spending money on a humongous FPGA if you can only achieve 20% utilization of its core fabric. And how well does the interconnect architecture perform its role in life?
More questions are popping into my head as I pen these words; with regard to the previous points, for example, how well does the design software work with the FPGA architecture. How much time will be consumed taking your RTL design, synthesizing it, and running through the place-and-route software?
In the case of components like microprocessors, there are industry-standard benchmarks that one can use to gain a reasonable level of understanding as to how the various devices from competing vendors compare against each other. Thus far, however, there really hasn't been much along these lines for designers interested in comparing different FPGAs. So I think all of us are going to find this "How To" article to be very, very interesting...
As a design engineer using FPGAs, how do you choose which FPGA is the best for your application? Every engineer faces this problem and is inundated with endless, self-promoting marketing claims that many times are taken with a "pinch of salt". Every FPGA vendor asserts that their part is this much faster, takes this much less time to compile, and consumes this much less power as well as claiming numerous other technical specifications. After all, any company can create artificially-tailored benchmarks that can put their FPGA in a better light when compared to that of the competition. So the question is: which benchmarking claims are accurate and which are marketing claims?
Altera evaluates and benchmarks its FPGAs in conjunction with its Quartus II software at every process node on a suite of comprehensive and representative customer designs. These customer designs are collected from a variety of market segments, such as networking, telecommunications, wireless, and consumer applications, and a variety of implementation technologies such as ASICs, gate arrays, and FPGAs from other vendors. These customer designs (which are maintained in a secure database) are used to understand how real customer designs perform against the current architecture and with that of the competition. Designs that do not optimize well on Altera's FPGA architecture are evaluated on over 150,000 experiments; enhancements to future incarnations of the FPGA architecture or design tools are then recommended based on the evaluations performed.
Based on the evaluation and benchmarking processes introduced above – and the fact that there is no standardized FPGA benchmarking processes – Altera has created what we believe to be a "fair and unbiased" benchmarking methodology, which is based on a set of real customer designs and endorsed by industry experts. But there's a problem: since the existing designs are proprietary to our customers, we cannot provide these designs for evaluation of Altera FPGAs by new/other customers. Although everyone understands the reasoning behind this, it still serves to amplify end-user skepticism. If I was a potential customer of Altera and was told that Stratix III FPGAs are on average 35% faster than Virtex-5 FPGAs, compile three times faster than Virtex-5 FPGAs, and provide 95% core utilization on average, then without the means of evaluating this myself, as an engineer, I would be very skeptical!
To overcome customer skepticism in benchmarking claims, we were able to obtain readily available designs that can be shared with customers for evaluation purposes that support the results offered below on how Stratix III FPGAs compare to Virtex-5 FPGAs in terms of core performance, utilization, and compile times as design size increases.
For this, we picked the seven most popular and largest designs from the OpenCores.org website listed in Table 1. The selection process of the OpenCore methodology was based on density in the "most popular projects" section on the OpenCores website.
Table 1. OpenCore Designs.
(1) Designs are available for evaluation purposes on Altera's website by Clicking Here
For the purposes of these OpenCore-based benchmarks, the largest comparable parts from Altera and Xilinx were used. Table 2 shows the devices used for benchmarking along with the latest available software.
Table 2. Device, Software, and Speed Grade.
(1) Similar results are seen on smaller parts.
(2) With Quartus II v8.0 improving over Quartus II v.7.2 SP3, preliminary results on ISE 10.1 shows no performance and utilization improvements on Virtex-5 FPGAs. Compile times have improved compared to ISE 9.2i SP4, but Quartus II v8.0 software is on average is 3 times faster than ISE 10.1.
(3) The medium speed grades are the fastest available in software for these target devices.
The standalone OpenCore designs are quite small if designed into large FPGAs (i.e. Altera's EP3S340 and Xilinx's XCV5LX330). In order to fill up the FPGA and to simulate the effect of increasing design size on performance, utilization, and compile times, multiple instances of each OpenCore were instantiated in the FPGA (stamping the same core repeatedly until the device filled up and the software could no longer implement more stamps of the OpenCore in the FPGA). Fig 1 shows four instantiations of the oc_aquarius design. Care was taken in the benchmarking and stamping methodology to assure that:
- Each stamp was implemented in parallel
- I/O wrapper logic was added to reduce the number of I/O pins required for the larger design.
- No timing-critical paths between the cores and the wrapper logic existed.
- The wrapper logic provided very little overhead (< 3%), it was implemented in parallel, and no timing-critical paths existed between the cores and in the wrapper logic.
1. The oc_aquarius design instantiated four times in the FPGA.
(Click this image to view a larger, more detailed version)
For complete implementation details, Click Here to see the OpenCore Stamping and Benchmarking Methodology Technical Brief.
The OpenCores were instantiated as many times as the device and software would allow so there were no compilation errors. The performance, utilization, and compile times were then compared at 5 stamp intervals for each FPGA.