Design Article
FPGA-Based FIR Filters using Distributed Arithmetic
M. Martinez-Peiro et al
2/4/2002 12:00 AM EST
![]() |
|
ABOUT THE AUTHORS
M.
Martinez-Peir, R. Colom, F. Ballester, and R.
Gadea are teachers with a focus on digital applications
at the Technical University of Valencia, Spain. They
are currently working on video compression, Custom DSP,
HDL simulators, digital filtering, and image lifting.
|
||


DA trades memory for combinatory elements,
resulting in ideal-to-implement custom DSPs (CDSPs) in
LUT-based FPGAs.
In addition to a DA implementation, the
designer also can select from a bit-serial to a full-parallel
implementation to trade bandwidth for resource utilization.
Cascade and lattice structures present several interesting
properties such as low quantification error and high-stability
in the filter coefficients. Moreover, you can expand lattice
cells without a full redesign.
The goal of this article is to implement
FPGA-based direct-form, cascade, and lattice high-order FIR
filters using bit-serial DA. We start by comparing the
resultant topologies in both area and speed. The designs use an
HDL to include pipeline techniques and scalable parameters. We
also describe DA error models of the three structures. The next
section reviews DA fundamentals and proposed architectures for
each kind of filter. This article also presents the results of
the FPGA implementation of the structures and discusses an
error model for the filter structures.
(1)
(2)
You can pre-calculate the terms in brackets in Equation 2, save the results in memory, and address these terms by xt,n in Table 1. Considering that each xtn can only take two values (0 or 1), each product term reaches one of the 2(N-1) possible values.
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 1:Distributed Arithmetic pre-calculated terms
DA Direct Form Implementation
A DA bit-serial implementation of a FIR filter addresses each
product term once per bit (the MSB bit is the sign bit). After
obtaining the last product-term, this term is added, with its
appropriate shift, to the rest of the product term previously
added.
Figure 1: Bit-serial DA direct-form FIR filter
Figure 1 shows the structure representing the
direct-form FIR filter. Considering that the FPGA this paper
discusses has four-inputs LUTs, the product-terms larger than
four need to be divided into r parts such that 4<T/r,
where T is the number of taps of the filter. In other words,
the adders in the tree structure add the r LUT outputs.
Eventually, you need a shift accumulator to add and shift each
product term. Figure 1 represents a bit-serial
implementation of a filter with samples of 8 bits. The output
of the filter occurs each eight clock cycles. When the sign-bit
arrives, a subtraction instead of an addition in the
shift-accumulator is done. By using carry save adders before
the LUT, you can implement a symmetrical filter"detailed
information of this operation is found in Croisier and
Proakis.


Additionally, you can extend the range of processing speed by pipelining the structure. Equation 3 expresses the operation frequency (fs), where L is the latency and n the number of the bits of each input sample.
(3)
Despite the increment of registers in the DA pipeline version, the final area resources increase slightly, due to the FPGA structure.
DA Cascade-Filter Implementation
You can factor a linear-phase FIR filter into several 4th order
sections to obtain an area reduction. The structure of these
sections can be DA adapted with the symmetry equation
(Equation 4) that represents the kth section of a
T-order filter"Equation 5 shows the expansion in DA
product terms of Equation 4. Equation 5
represents the basic cell of a cascade structure you can design
using a bit-serial approach (Figure 2).
(4)
(5)
Figure 2: Bit-serial DA cascade 4th-order cell
As a result of the latency generated by the pipeline, the cascade cell has extra registers (one per stage of pipeline) to synchronize the operation of the filter. Equation 6 represents the real-time frequency operation of the cascade structure.
(6)
DA Lattice-Filter Implementation
You use the recursive equations that describe the lattice cell
structures (Equation 7) to obtain cascade
implementations of M cells.
Both the f and g terms represent the
forward and the backward predictions respectively in a linear-predictive
filter structure.
(7)
Using the DA equations of Equation 8, we can reproduce the f and g terms with two LUTs, where g' represents the g(n-1) term.
(8)
Figure 3: Bit-serial DA lattice cell
Figure 4: Bit-serial improved lattice cell
Figure 3 represents the formal DA implementation of Equation 8. However, in this article we also propose an improved structured in Figure 4 that reduces the memory requirements of the DA lattice cell. With the structure of Figure 4, you only need to save the coefficient (km). You can get both the km+1 and 1 values from the carry input of the scaling accumulator.
Figure 5: Operational frequency characteristics of DA bit-serial filter implementations
Figure 6: Area characteristics of DA bit-serial filter implementations
The bit-serial implementation of the lattice structure achieves a real-time operation of 7.5 MHz. In the cascade and direct-form structure, filter operational frequency continuously decreases to 4 MHz as filter order increases.
In DA bit-serial cascade structures (Figure 2), the error is modeled by Equation 9, where em and ea (shaded red in Figure 2) are the LUT and shift-accumulator rounding errors.
(9)
The variance of this error is expressed in Equation 10, where pm and pa are the number of bits in both memory and the shift-accumulator.
(10)
Equation 11 represents a direct-form DA error model, where r is the number of data-sample partitions or memory partitions.
(11)
Furthermore, as a result of the four-input LUT structures, the partition of the memories in the FPGA case is limited by r<T/4 (T is the order of the filter).
Equation 12 shows the variance of the error in the direct-form structure.
(12)
Figure 4 shows the improved lattice cell with both error sources em and ea shaded in red. Equation 13 shows the error and the variance in this cell:
(13)
As an example, we used a T-order FIR filter with p=pm=pa=8 bits to compare the three models. The results in the direct-form, cascade and lattice implementations are T4.2384e-07+1.6953e-06, T3.3907e-06, and T2.5868e-11, respectively. The lattice filter has the lowest error, while the cascade form has the highest error. Finally, the direct-form structure also has a high error compared with the lattice-cell structure.
- We can implement a bit-serial 40th-order lattice filter
in a 10K50 device with a real-time frequency operation of 7.5
MHz. The pipelined cascade and direct-form bit-serial
implementations reach 4.5 MHz for a 60th-order structure in
their symmetrical implementations.
- We have been presented a new improved lattice cell
reduces memory usage by using the input carry in the
shift-accumulator.
- We presented a DA error model that shows that the lattice
structure represents the lowest rounding error while cascade
structure has the highest error.




