datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Using FPGAs to solve tough DSP design challenges

Reg Zatrepalek, Hardent Inc.

7/23/2012 8:37 AM EDT

Page 3
The implementation illustrated in Figure 3 is known as a multiply-and-accumulate or MAC-type implementation. This is almost certainly the way a filter would be implemented in a classical DSP processor. The maximum performance of a 31-tap FIR filter implemented in this fashion in a typical DSP processor with a core clock rate of 1.2 GHz is about 9.68 MHz, or a maximum incoming data rate of 9.68 Megasamples per second.


Figure 3 – MAC implementation in a classical DSP

An FPGA, on the other hand, offers many different implementation and optimization options. If a very resource-efficient implementation is desired, the MAC engine technique may prove ideal. Using a 31-tap filter as an example illustrates the impact of filter specifications on required logic resources. A block diagram of the implementation is shown in Figure 4.


Figure 4 – MAC engine FIR filter in an FPGA

Memory is required for data and coefficient storage. This may be a mixture of RAM and ROM internal to the FPGA. RAM is used for the data samples and is implemented using a cyclic RAM buffer. The number of words is equal to the number of filter taps and the bit width is set by sample size. ROM is required for the coefficients. In the worst case, the number of words will be the same as the number of filter taps, but if symmetry exists, this may be reduced. The bit width must be large enough to support the largest coefficient. A full multiplier is required since both the data sample and coefficient data change on every cycle. The accumulator adds the results as they are produced. The capture register is needed because the accumulator output changes on every clock cycle as the filter is sampling data. Once a full set of N samples has been accumulated, the output register captures the final result.

When used in MAC mode, the DSP48 is a perfect fit. The input registers, output registers and adder unit are present in the DSP48 slice. The resources required for this 31-tap MAC engine implementation are one DSP48, one 18-kbit block RAM and nine logic slices. There are a few additional slices required for sample and coefficient address generation and control. If a 600-MHz clock were available in the FPGA, this filter could run at an input sample rate of 19.35 MHz, or 19.35 Msamples/s in a -3 speed grade Xilinx® 7 series device.

If the system specification required a higher-performance FIR filter, a parallel structure could be implemented. Figure 5 shows a block diagram of a Direct Form Type I implementation.


Figure 5 – Direct Form I FIR filter in an FPGA

The Direct Form I filter structure provides the highest-performance implementation within an FPGA. This structure, which is also commonly referred to as a systolic FIR filter, uses pipelining and adder chains to exploit maximum performance from the DSP48 slice. The input is fed into a cascade of registers that acts as the data sample buffer. Each register delivers a sample to a DSP48 which is then multiplied by the respective coefficient. The adder chain stores the partial products that are then successively combined to form the final result.

No external logic is required to support the filter and the structure is extendable to support any number of coefficients. This is the structure that can achieve maximum performance, because there is no high-fanout input signal. The resources required to implement a 31-tap FIR filter are only 31 DSP48 slices. If a 600-MHz clock were available in the FPGA, this filter could perform at an input sample rate of 600 MHz, or 600 Msamples/s, in a -3 speed grade 7 series device.

From this example, you can clearly see that the FPGA not only significantly outperforms a classic digital signal processor, but it does so with much lower clock rates (and therefore lower power consumption).

This example illustrates only a couple of implementation techniques for FIR filters in FPGA. The device may be further tailored to take advantage of data sample rate specifications that may fall in between the extremes of sequential MAC operation and full parallel operation. You may also consider additional trade-offs between performance and resource utilization involving symmetric coefficients, interpolation, decimation, multiple channels or multirate. The Xilinx CORE Generator™ or System Generator utilities will help you exploit all of these design variables and techniques.




anne-francoise.pele

7/23/2012 10:30 AM EDT

Another piece, originating from the first quarter edition 2012 of the Xcell Journal, is "Embedded Vision: FPGAs’ Next Notable Technology Opportunity".

To access the article, click here: http://www.eetimes.com/design/military-aerospace-design/4376567/Embedded-vision--FPGAs--next-technology-opportunity

Sign in to Reply



Dr DSP

7/23/2012 1:48 PM EDT

There are some very useful concepts covered here and the summary is generally spot on, however it is important to consider the DSP function in the system context.

If other functions are required in addition to the DSP it may push the solution of choice into an FPGA. For example, a low performance DSP function that is part of a sensor or motor control system need not be implemented in a stand alone DSP device. An FPGA might be the right solution in this case.

Sign in to Reply



ReneCardenas

7/26/2012 11:44 AM EDT

Dr. DSP,
COuldn't the same be said in the other direction?, if there are other considerations more suitable for a DSP processor then the tilt can be as well move the other direction.
In my opinion, it has to be a case by case decision of the designer, given a set of resources and time constraints.

Just another point of view

Sign in to Reply



Greg.Dee

7/26/2012 5:02 PM EDT

no offence but "For example, a low performance DSP function that is part of a sensor or motor control system need not be implemented in a stand alone DSP device. An FPGA might be the right solution in this case." doesn't make sense to me, if it's low performance you go down the performance chain, not up it. So one would consider a general micro-controller with it's obvious advantages.

Sign in to Reply



glen.herrmannsfeldt

7/26/2012 3:10 AM EDT

The FIR equation is wrong.

Sign in to Reply



ReneCardenas

7/26/2012 11:32 AM EDT

Glen,

It is too easy to critisize and rush to jugment in haste, when it is so easy to offer the correction of such typo that appears in many publications that are transcribed by non-technical people.
Simply stating the transgression in this case that the index coefficients are transposed for the constatnt term and the discrete variable term, would have accomplished more and been more informative to others that may not have see this simple transgression. Article as good merits otherwise, in my opinion.

Sign in to Reply



nicolas.mokhoff

7/26/2012 11:51 AM EDT

ReneCardenas: your comment on striving toward positive criticism is welcome.

Sign in to Reply



ReneCardenas

7/27/2012 4:29 PM EDT

Thanks Nic, that is my motto, be wise enough to know that in no way we can master the universe alone, but each of us should attempt to make the universe much friendlier place to everyone. Specially new commers to engineering.

There are lots of complexities and tough problems in the world, and nothing is gained by been destructive.

Sign in to Reply



Medina

7/26/2012 11:42 AM EDT

Glen, would you care to point out the anomaly in the equation? It wasn't very evident when I looked at it.

Thanks

Sign in to Reply



EricC

7/26/2012 8:52 AM EDT

Further information on how Xilinx System Generator and MathWorks HDL Coder enable Model-Based Design for targeting Xilinx FPGAs is available at http://www.mathworks.com/xilinx.

Sign in to Reply



EricC

7/26/2012 8:56 AM EDT

Further information on how Xilinx System Generator and other HDL code generation tools may be used with MATLAB and Simulink -- including examples, demos, and videos -- are available from http://www.mathworks.com/fpga.

Sign in to Reply



EricC

7/26/2012 10:09 AM EDT

Corrected link is http://www.mathorks.com/fpga

Sign in to Reply



Krutsch

7/26/2012 4:05 PM EDT

If only one would have to do FIR filters only…? Fact is the software stack is more complicated and it is really a pain to do it on an FPGA. For radar, some high end medical applications it might be a good choice.. For many , many applications it is a pain, try to get a solution certified for some automotive and avionics applications and you will see.

Sign in to Reply



Alxx123

7/27/2012 12:48 AM EDT

Thats all well and good but no mention or comparison on the power used in dsp vs fpga

Sign in to Reply



agk

7/29/2012 6:56 AM EDT

With FPGA's we can create massively large parallel processing so that DSP algorithms can bring useful results.

Sign in to Reply



kinnar

7/29/2012 2:37 PM EDT

Actually what we are trying to implement using the FPGA is already there in DSP Processor, but what matters is the portability and the size reduction of the final product by implementing some functionality of DSP using FPGA, this way one will be able to reduce the use of DSP in many designs, but the real disadvantage of this method is it totally depends hardware dependent.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)