Although signal processing is usually associated with digital signal processors, it is becoming increasingly evident that FPGAs are taking over as the platform of choice in the implementation of high-performance, high-precision signal processing.
For many such applications, the choice generally boils down to using either a single FPGA, a FPGA with an associated DSP processor or a farm of DSP processors.
While it is generally understood that DSP processors can be programmed in C – leading to a much simpler development flow – this advantage is quickly dissipated when the design has to be partitioned across either multiple DSP processors or between a DSP processor and a FPGA. The truth is that a single DSP processor lacks the performance to do the signal processing required by most infrastructure systems.
This then requires system designers to make a choice between using multiple DSP processors or a FPGA. The latter choice almost always results in the lowest system cost/power implementation.
Figure 1 shows some of these infrastructure systems that share
one thing in common – the performance requirements exceed the
capabilities of a traditional programmable digital signal processor.
These systems also have different performance and precision requirements, as well as different design and development flows.
For example, video processing requires 9- to 12-bit precision, with some
high-end designs needing a 12-bit color depth. These designs are
generally created in a HDL design flow, with video- and image-processing
IP functions increasingly utilized to speed up the development flow.
On the other side of the spectrum, military radar designs require the
highest DSP performance and floating-point precision to get the highest
dynamic range. Many of these designs are modeled in MATLAB and Simulink
tools, along with floating-point functions that are optimized for the
Fig.1: Different applications need different performance, precision, IP and tools (click image to enlarge).
When selecting a FPGA – system designers must look at the FPGA silicon architecture as well as the selection of design tools, IP, functional system blocks and reference designs available to assure that they can quickly and efficiently complete the implementation of their algorithms.
This article explores some typical DSP solutions required to speed up FPGA-DSP design implementation.
Set FPGA precision to match application
Choose a FPGA-DSP architecture that matches the precision requirements of the algorithm. Try to avoid having to tweak the algorithm into some arbitrary precision provided by the FPGA vendor.
Traditionally FPGA architectures have fixed precision DSP architectures – that forces either wasting precious silicon resources or partitioning the design across multiple blocks, thus reducing system performance. For example when implementing HD video processing applications – systems that typically use a 9x9 multiply operation – a fixed precision 18x25 DSP architecture is overkill. More than half of the DSP block is wasted.
Or when implementing complex multipliers – a common building block for Fast Fourier transforms (FFTs) functions implemented across the board in high-performance DSP systems, in many cases DSP blocks have to be cascaded to support 18x25 or 18x36 complex multiply operations. Make sure that the DSP blocks have wide enough cascade chain and accumulator width to implement the cascade chain using dedicated routing. Using generic routing within the FPGA may impose a performance penalty. Also if the cascade bus/accumulator is not wide enough you may have to take a precision hit before routing the result to the next DSP block.
Some designers maybe looking to implement floating point data-paths for a portion of their design. These functions would require a 24-bit or higher precision to implement mantissa multiplication in a single precision format.
When selecting the FPGA-DSP architecture make sure that the architecture can support multiple precisions and has a wide enough cascade bus – to meet system precision requirements (see Table 1
Table 1: Precision modes supported by 28-nm DSP block architecture
One of the most commonly implemented functions within a FPGA is the finite impulse response (FIR) filter. It is critical that the FPGA DSP block architecture efficiently supports the implementation of high-performance, multi-channel FIR filters. Some of the key features in Altera’s 28nm DSP architecture are clearly designed with the FIR filter design in mind are shown in Table 2
Table 2: Features of the DSP architecture for FIR filter design (click image to enlarge).
Taking a MATLAB/Simulink design all the way to hardware with just the push of a button has always been a marketing claim. However, what has been neglected is to point out that the design then requires significant tweaking to make sure all the right timing constraints are met. What is needed is a ‘timing-driven’ Simulink synthesis engine.
This tool not only generates raw structural HDL, but it is intitutive enough to add in pipeline registers or time-division multiplexing, so the resulting HDL meets the fMAX or latency constraints.
Altera’s DSP Builder Advanced Blockset is designed around that premise. It is a tool that analyzes the Simulink design description and generates both an HDL and a bit-stream for the target FPGA device, which incorporates the timing constraints –fMAX or latency. This is done automatically by adding in pipeline registers and the right amount of timing division multiplexing to meet or exceed the specified timing.
Large, high-performance military radar designs developed using DSP Builder Advanced Blockset can close timing without having to manually tweak the HDL.
Figure 2 below shows an example where a 50,000 logic element (LE), FPGA-based design closes timing at over 350 Mhz. While the design example shown in Figure 2
is a front-end of a radar system, the functions that are implemented – poly-phase FIR filter, FFT and mixer – are commonly used in many high-performance DSP designs. (Click on image to enlarge
Using FPGAs for high-definition video