Design Article
How to write an optimized FIR filter
Robert Oshana, Texas Instruments
4/23/2007 3:00 AM EDT
Today's DSP architectures are made specifically to maximize throughput of DSP algorithms, such as a DSP filter. Some of the features of a DSP include:
- On-chip memory – Internal memory allows the DSP fast access to algorithm data such as input values, coefficients and intermediate values.
- Special MAC instruction – For performing a multiply and accumulate, the crux of a digital filter, in one cycle.
- Separate program and data buses – Allows the DSP to fetch code without affecting the performance of the calculations.
- Multiple read buses – For fetching all the data to feed the MAC instruction in one cycle.
- Separate Write Buses – For writing the results of the MAC instruction. Parallel architecture – DSPs have multiple instruction units so that more than one instruction can be executed per cycle.
- Pipelined architecture – DSPs execute instructions in stages so more than one instruction can be executed at a time. For example, while one instruction is doing a multiply another instruction can be fetching data with other resources on the DSP chip.
- Circular buffers – To make pointer addressing easier when cycling through coefficients and maintaining past inputs.
- Zero overhead looping – Special hardware to take care of counters and branching in loops.
- Bit-reversed addressing – For calculating FFTs.
When converting an analog signal to digital format, the signal has to be truncated due to the limited precision of a DSP. DSPs come in fixed- and floating-point format. When working with a floating-point format, this truncation usually is not much of a factor due to its good mix of precision and dynamic range. However, implementing hardware to deal with floating-point formats is harder and more expensive, so most DSPs on the market today are fixed-point format. When working with fixed-point format a number of considerations have to be taken into account. For example, when two 16-bit numbers are multiplied, the result is a 32-bit number. Since we ultimately want to store the final result in 16-bit format, we need to handle this loss of data.
Clearly, by just truncating the number we would lose a significant portion of the number. To deal with this issue we work with a fractional format called Q format. For example, in Q15 (or 1.15) format, the most significant digit is used to represent the sign and the rest of the digits represent the fractional part of the data. This allows for a dynamic range of between –1 and just less than 1. However, the results of a multiply will never be greater than one. So, if the lower 16 bits of the result are dropped, a very insignificant portion of the results is lost. One nuance of the multiply is that there are two sign bits, so the result will have to be shifted to the left one bit to eliminate the redundant information. Most processors will take care of this, so the designer doesn't have to waste cycles when doing many multiplications in a row.
Overflow and saturation
Two other problems that can occur when using fixed-point arithmetic are overflow
and saturation. However, DSPs help the programmer deal with these problems. One
way a DSP does this is by providing guard bits in the accumulator. In a normal 16-bit
processor, the accumulator may be 40 bits; 32 bits for the results (keep in mind that
a 16x16 bit multiplication can be up to 32 bits) and an extra 8 bits to guard against
overflow (of multiple multiplies in a repeat block.)
Even with the extra guard bits, multiplications can provide overflow situations where the result contains more bits than the processor can hold. This situation is handled with a flag called an overflow bit. The processor will set this automatically when the results of a multiplication overflow the accumulator.
When an overflow occurs, the results in the accumulator usually become invalid. So what can be done? Another feature of DSPs can be used: saturation. When the saturate instruction on a DSP is executed, the processor sets the value in the accumulator to the largest positive or negative value the accumulator can handle. That way, instead of possibly flipping the result from a high positive number to a negative number, the result will be the highest positive number the processor can handle.
There is also a mode DSP processors have that will automatically saturate a result if the overflow flag gets set. This saves the code from having to check the flag and manually saturate the results.
For more on this topic, see Fixed-Point DSP and Algorithm Implementation.



