Design Article
Using FPGAs to improve your wireless subsystem's performance
Dave Nicklin and Tom Hill
8/25/2008 9:05 AM EDT
Common examples of operations found in wireless applications include FIR filtering, Fast Fourier Transforms (FFTs), digital down and up conversion and Forwared Error Correction (FEC) blocks.
By offloading operations that require high-speed parallel processing onto the FPGA and leaving operations that require high-speed serial processing on the processor, overall system performance and cost can be optimized while lowering system requirements.
Partitioning
The FPGA can be used with a digital signal processor (DSP),
serving either as an independent pre-processor (or sometimes
post-processor) device, or as a co-processor. In a preprocessing
architecture, the FPGA sits directly in the data path and is
responsible for processing the signals to a point when they can be
efficiently and cost-effectively handed off to a DSP processor for
further lower-rate processing.
![]() |
| Figure 1: In co-processing architectures, the FPGA sits alongside the DSP, which offloads specific algorithmic functions to the FPGA to be processed at significantly higher speeds than what is possible in a DSP processor alone. |
In co-processing architectures, the FPGA sits alongside the DSP, which offloads specific algorithmic functions to the FPGA to be processed at significantly higher speeds than what is possible in a DSP processor alone. The results are passed back to the DSP or sent to other devices for further processing, transmission or storage (Figure 1 above).
Timing margins
The choice of pre-processing, post-processing or co-processing is often
governed by the timing margins
needed to move data between the processor and FPGA and how that
impinges on the overall latency.
Although a co-processing solution is the topology most often considered by designers--primarily because the DSP is in more direct control of the data hand-off process-- this may not always be the best overall strategy.
![]() |
| Figure 2: Shown is an LTE example of co-processing data-transfer latency issues. |
Consider, for example, the latest specifications for 3GPP Long Term Evolution, in which the transmission time interval has been reduced to 1ms, down from 2ms for HSDPA and 10ms for W-CDMA. This essentially requires that data be processed from the receiver and through to the output of the media access control (MAC) layer in less than 1,000 microseconds.
Figure 2 above shows that using a serial RapidIO port on the DSP running at 3.125Gbit/s, with 8bit/10bit encoding and a 200bit overhead for the Turbo decode function, results in a DSP-to-FPGA transfer delay of 230µs. Taking into account other expected delays, the Turbo codec performance required to meet these system timings is a very demanding 75.8Mbit/s for 50 users.
Using an FPGA to process the Turbo codecs as a largely independent post-processor not only removes DSP latency but saves time because there's no need to transfer the data at a high bandwidth between the DSP and FPGA.
This reduces the throughput rate of the Turbo decoder down to 47Mbit/s, a decrease that allows more cost-effective devices, and has reduced system power dissipation.
Another consideration is whether to use soft- or hard embedded processor intellectual property (IP) on the FPGA to offload some of the system processing tasks, which in turn offers the possibility of additional cost, power and footprint reduction benefits.





