# FPGAs boost wideband receivers

Post a comment

Hardware multipliers have enabled FPGAs to invade such DSP territory as software radio, where they are now challenging both ASICs and programmable DSPs. Initially competing in specialized architectures for digital receivers, the latest FPGAs can outperform ASICs for the data processing demands of the new wideband communication standards. Still, coaxing these new devices to handle higher sampling rates requires careful allocation and deployment of FPGA resources.

To achieve lower decimation factors, wideband receivers rely on the classical FIR filter implementation. But wideband receivers require substantially more hardware than their narrowband counterparts because they cannot rely on cascaded integrator/comb (CIC) filters for decimation in the first stages, where sampling rates are the highest. The desired filter response can be achieved only by adding enough filter taps for undecimated input samples, and each tap of the FIR filter requires a multiply and an add. Since hardware multipliers consume a significant portion of silicon, they must be deployed judiciously.

Because wideband digital receivers are so multiplier-intensive, the new generation of FPGA devices, featuring dozens of dedicated hardware multipliers, became an attractive platform for them. Hence the attempt to design a general-purpose wideband receiver similar to the Graychip GC1012B, a popular ASIC wideband receiver, but with enhanced performance.

**Custom coefficients**

The FPGA design should be able to accept data from a new monolithic 12-bit analog-to-digital converter operating at sampling rates up to 200 MHz, twice the maximum input rate supported by the GC1012B. Dynamic range performance for the 80 percent filter should be increased from 75 dB to 100 dB. The third major advantage is the added capability of downloading custom FIR filter coefficients to meet some of the new, tougher wideband-frequency templates.

The Xilinx Virtex-II family of FPGAs was chosen for the devices' mix of block memory, system gates and 18 x 18 hardware multipliers. Intellectual-property cores are available for all the basic building blocks. These include a complete direct digital synthesizer (DDS) for the local oscillator and configurable FIR filter designs. The mixer is nothing more than two hardware multipliers.

The first problem was the 200-MHz input clock requirement, since the available-speed-grade FPGAs offered a maximum clock of 125 MHz for the multipliers and the DDS section. The solution was to divide the DDS and mixer into identical sections, each running at 100 MHz. The output of the A/D converter is then demultiplexed into two streams to match this rate.

Each DDS must deliver output sine and cosine samples at 100 MHz, advancing by the same phase step each clock cycle. However, the output phase of one DDS must be offset by one-half of this phase step to match the alternating sample sequence from the A/D converter. To do this, an extra adder stage is required ahead of the sine/cosine lookup table for one DDS engine. The net result is that together, the two DDS engines generate alternate samples of an idealized 200-MHz DDS local oscillator. This arrangement preserves phase-continuous frequency switching for complex FSK or sweep sequences.

The FIR filter is also divided into two complex FIR filters, one for each mixer output. Each filter section receives half of the coefficients and calculates the taps assigned to the alternate sample stream it receives. The two filter outputs are added in an output-combining stage to produce the final complex output.

The 100-dB out-of-band rejection specification of the filter requires 56 taps for a decimate-by-two design. If all of of these were implemented with dedicated multipliers, 112 multipliers would be required to handle the complex signals. But two strategies bring this number down to a more reasonable count.

**Reducing multipliers**

By taking advantage of symmetrical filter coefficients, two input samples can be added before the multiplication, saving a factor of two. Also, since the output rate is half the input rate and since the multiplier operates at the input clock rate, one multiplier can be shared to calculate two taps. This further reduces the total number of dedicated hardware multipliers, to 28.

To handle the other required decimation factors of 4, 8, 16, 32 and 64 with equivalent filter performance, the number of filter taps approximately doubles with each step. Since the output rate is also reduced by decimation, the extra time between output samples allows the multipliers to be time-shared to compute these additional taps. In this way, the 28 hardware multipliers can handle all the decimation factors.

One conventional approach for implementing the delay line for the FIR is to use registers within the logic slices. For the decimate-by-64 mode, there are 1,792 filter taps, which results in an extremely inefficient utilization of the slices. Instead, the delay line is constructed from block RAM plus suitable addressing engines. As input samples enter the RAM, they are stored in a circular block with the newest sample replacing the oldest one. The size of the block is adjusted to the number of taps for each decimation factor. Since this RAM is dual-ported, an output-addressing engine can efficiently pick the pairs of samples required to take advantage of the symmetrical filter coefficients.

**Expanding options**

Since all math is performed with fixed-point engines, great care must be taken in scaling, rounding and defining word lengths.

Although the device is designed to work with a 12-bit A/D converter, provisions are made for 16-bit input samples to support other sources that can take full advantage of the dynamic range of the receiver. The mixer multipliers also accept 18-bit sine/cosine samples from the DDS and the outputs are rounded to 17 bits using a bias-free algorithm. When two of these 17-bit samples from the delay RAM are added, the 18-bit result matches the input of the tap multiplier. The filter accumulators are 42 bits wide to avoid overflow for intermediate results even though the final sum of products requires far fewer bits.

This FPGA wideband receiver fits in the Xilinx XC2V3000 device, utilizing approximately 39 percent of the logic-slice flip-flops, 38 percent of the LUT memory, 77 percent of the block RAM and 67 percent of the multipliers. For heavily dedicated applications with only one decimation factor and fixed filter coefficients, many of the programmable features of this general-purpose design can be eliminated for extra functions or additional channels.

In general, FPGA-based digital receivers offer unprecedented flexibility in filter characteristics, dynamic range, sampling rates and frequency-switching features to support the demands of new wideband communications standards emerging now and in the future.