datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Comment


arun5500

10/28/2010 10:38 PM EDT

Hi ,This is Arunraj an master graduate in VLSI Design,Iam doing my acadaemic ...

More...



Abrahim

10/26/2010 3:43 AM EDT

one important aspect of a DSP-FPGA combination which was left out of this ...

More...

DSP options to accelerate your DSP+FPGA design

Suhel Dhanani, Altera Corporation

10/14/2010 2:56 PM EDT

Although signal processing is usually associated with digital signal processors, it is becoming increasingly evident that FPGAs are taking over as the platform of choice in the implementation of high-performance, high-precision signal processing.

For many such applications, the choice generally boils down to using either a single FPGA, a FPGA with an associated DSP processor or a farm of DSP processors.

While it is generally understood that DSP processors can be programmed in C – leading to a much simpler development flow – this advantage is quickly dissipated when the design has to be partitioned across either multiple DSP processors or between a DSP processor and a FPGA. The truth is that a single DSP processor lacks the performance to do the signal processing required by most infrastructure systems.

This then requires system designers to make a choice between using multiple DSP processors or a FPGA. The latter choice almost always results in the lowest system cost/power implementation.

Figure 1 shows some of these infrastructure systems that share one thing in common – the performance requirements exceed the capabilities of a traditional programmable digital signal processor.

These systems also have different performance and precision requirements, as well as different design and development flows.

For example, video processing requires 9- to 12-bit precision, with some high-end designs needing a 12-bit color depth. These designs are generally created in a HDL design flow, with video- and image-processing IP functions increasingly utilized to speed up the development flow.

On the other side of the spectrum, military radar designs require the highest DSP performance and floating-point precision to get the highest dynamic range. Many of these designs are modeled in MATLAB and Simulink tools, along with floating-point functions that are optimized for the FPGA architecture.
Fig.1: Different applications need different performance, precision, IP and tools (click image to enlarge).

When selecting a FPGA – system designers must look at the FPGA silicon architecture as well as the selection of design tools, IP, functional system blocks and reference designs available to assure that they can quickly and efficiently complete the implementation of their algorithms.

This article explores some typical DSP solutions required to speed up FPGA-DSP design implementation.

Set FPGA precision to match application
Choose a FPGA-DSP architecture that matches the precision requirements of the algorithm. Try to avoid having to tweak the algorithm into some arbitrary precision provided by the FPGA vendor.

Traditionally FPGA architectures have fixed precision DSP architectures – that forces either wasting precious silicon resources or partitioning the design across multiple blocks, thus reducing system performance. For example when implementing HD video processing applications – systems that typically use a 9x9 multiply operation – a fixed precision 18x25 DSP architecture is overkill. More than half of the DSP block is wasted.

Or when implementing complex multipliers – a common building block for Fast Fourier transforms (FFTs) functions implemented across the board in high-performance DSP systems, in many cases DSP blocks have to be cascaded to support 18x25 or 18x36 complex multiply operations. Make sure that the DSP blocks have wide enough cascade chain and accumulator width to implement the cascade chain using dedicated routing. Using generic routing within the FPGA may impose a performance penalty. Also if the cascade bus/accumulator is not wide enough you may have to take a precision hit before routing the result to the next DSP block.

Some designers maybe looking to implement floating point data-paths for a portion of their design. These functions would require a 24-bit or higher precision to implement mantissa multiplication in a single precision format.

When selecting the FPGA-DSP architecture make sure that the architecture can support multiple precisions and has a wide enough cascade bus – to meet system precision requirements (see Table 1).


Table 1: Precision modes supported by 28-nm DSP block architecture
.

One of the most commonly implemented functions within a FPGA is the finite impulse response (FIR) filter. It is critical that the FPGA DSP block architecture efficiently supports the implementation of high-performance, multi-channel FIR filters. Some of the key features in Altera’s 28nm DSP architecture are clearly designed with the FIR filter design in mind are shown in Table 2.

Table 2:  Features of the DSP architecture for FIR filter design (click image to enlarge).

Taking a MATLAB/Simulink design all the way to hardware with just the push of a button has always been a marketing claim. However, what has been neglected is to point out that the design then requires significant tweaking to make sure all the right timing constraints are met. What is needed is a ‘timing-driven’ Simulink synthesis engine.

This tool not only generates raw structural HDL, but it is intitutive enough to add in pipeline registers or time-division multiplexing, so the resulting HDL meets the fMAX or latency constraints.

Altera’s DSP Builder Advanced Blockset is designed around that premise. It is a tool that analyzes the Simulink design description and generates both an HDL and a bit-stream for the target FPGA device, which incorporates the timing constraints –fMAX or latency. This is done automatically by adding in pipeline registers and the right amount of timing division multiplexing to meet or exceed the specified timing.

Large, high-performance military radar designs developed using DSP Builder Advanced Blockset can close timing without having to manually tweak the HDL.
Figure 2 below shows an example where a 50,000 logic element (LE), FPGA-based design closes timing at over 350 Mhz. While the design example shown in Figure 2 is a front-end of a radar system, the functions that are implemented – poly-phase FIR filter, FFT and mixer – are commonly used in many high-performance DSP designs. (Click on image to enlarge).


Next Page: Using FPGAs for high-definition video




Dr DSP

10/18/2010 1:25 PM EDT

One point left out of this overview is the importance of feeding data to high performance DSP algorithms. Many times the bottleneck turns out to be the off-chip memory interface when doing a high-performance FPGA-based DSP design. Sparce matrix operations in particular can be a challenge. How about addesssing this point in a future article?

Sign in to Reply



teddy_zhai

10/21/2010 10:35 AM EDT

the Figures in the first are too small and not readable.

Sign in to Reply



patrick.mannion

10/21/2010 12:02 PM EDT

Hi Teddy: Thanks for pointing that out! I've updated the file so that when you click on those images they'll enlarge for you. Best regards,

Sign in to Reply



arun5500

10/28/2010 10:38 PM EDT

Hi ,This is Arunraj an master graduate in VLSI Design,Iam doing my acadaemic project in designing an DSP ip ,thanks for this valuable post,can you upgrade me by providing some more valuable details of DSP in FPGA design.my mail id is mbrsai.me@gmail.com,thank you in advance.

Sign in to Reply



Abrahim

10/26/2010 3:43 AM EDT

one important aspect of a DSP-FPGA combination which was left out of this article but can be very important is the use of FPGA as a 'merging' or 'connecting' pool for data and communication between DSP/Processors.
Often in critical applications it is desirable to avoid the task scheduling complications and the glitches & overheads associated with a RTOS framework. in such a scenario small processors dedicatedly executing critical tasks and using the FPGA as an entity to exchange data and communicate with other processors help to keep the architecture simple and robust. in addition to serving as a data sharing location it can also takes care of arbitration and data aging related operations. I also found it very helpful in interfacing ADCs (even those with complex interfaces) to DSPs using FPGAs as it even took care of some pre-processing filtering needs.
As a fundamental rule of the thumb i normally shift 'static' logic into the FPGA ans 'configurable/dynamic' logic into the processors, and on many occasions it has helped me use smaller DSPs with better results.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)