# How many bits do you need - Part 2

**Round-off and Quantization Errors in Filter Calculations**

Quantization also impacts system performance when it occurs in the filter response calculation. Quantization occurs in the filter response due to the physical constraints of the processing architecture. An example of this is illustrated in the second order Direct Form 1 filter architecture shown in Figure 9.

**Figure 9. Implementation of a Digital Filter**

In this figure, the input signal and the output signal are represented in A bits. The a and b coefficients are represented in B bits. Registers of size B bits produce the input and output delayed signals. The multiplication of the input and output signals with the b and a coefficients is performed by an A bit by B bit multiplier that produces an A+B bit result. An A+B bit wide adder is then used to sum the products of the multiplier and compute the unquantized filter response.

In this example, quantization occurs at the output of the adder, the highlighted "Q" block, where the A+B bit result is quantized into A bits. This quantization step produces a quantization error.

The quantization error is the noise source of the digital filter. The noise characteristics are determined by the quantization error characteristics, error magnitude and type of quantization that is performed (round-off, twos complement, or signed magnitude truncation).

The noise output, f (n), from the filter is a function of the noise source, e (n) and the a coefficients of the filter as shown in Equation 3 and in Figure 10.

**Equation 3. Filter Noise**

**Figure 10. Digital Filter Noise**

To preserve as much as possible of input signal SNR, we would like to have the noise amplitude below the smallest signal amplitude of interest, as shown in Figure 11.

Because the filter noise is proportional to the quantization error and the quantization error is reduced by increasing the calculation precision, the filter noise is reduced by increasing the calculation precision. This is achieved by increasing the precision of the signal representation above the number of bits that are necessary to represent the input signal. These additional bits are called noise bits.

**Figure 11. Impact of Additional Precision for Noise**

**How many noise bits are enough? **

To illustrate the characteristics of truncation noise in IIR filters, a parametric equalization filter is evaluated at several frequencies. Figure 12 shows the Signal Transfer Function of a parametric equalization filter with a gain of 12 dB and a Q of 6.667 for a full-scale sine wave across the spectrum. To show the impact of frequency, the filter is evaluated at 1000 Hz, 100 Hz, and 30 Hz.

**Figure 12. Equalization Filter Response for 16 bit 48 kHz Data**

As we will see, in the case of parametic equalization filters, the magnitude of noise response increases when:

Filter Q increases

Sample rate increases

Filter frequency decreases

The noise transfer function is used to show the amplitude of filter noise with respect to frequency. The noise transfer function is computed by collecting the terms of the noise difference equation shown in Equation 3 and taking the z-transform. The Noise Transfer Function (NTF) has the form:

**Equation 4. Noise Transfer Function**

For magnitude truncation, the characteritics of the noise source are: Uniform distribution over +/- 2^-2b, where b is the number of noise bits.

The variance is:

The variance at the filter output noise is:

where hq are samples of the impulse response function of the NTF.

Figure 13 shows the Noise Transfer Function of the three parametric equalization filters using 8 and 16 noise bits.

The 0 dB line of the graph represents the lowest signal amplitude of interest, which is the lowest amplitude and the noise floor of the the ideal input signal. In this figure, we can see that 8 noise bits are sufficent to keep the 1000 Hz filter noise below the 0 dB threshold, but are not sufficient for the 100 or 30 Hz filters. Similarly, the 16 bits are sufficient to keep 1000 and 100 Hz filters below the 0 dB threshold. However, at 30 Hz, the noise cuts into the signal response by 11 dB.

**Figure 13. Equalization Filter Noise for 48 KHz Data**

It is important to note that because the noise level is specified with respect to the number of noise bits -- that Figure 13 is applicable for any signal precision.

Figure 14 shows the difference of the signal and noise transfer functions for 16 bit data with 8 and 16 noise bits. This graph shows the signal-to-noise ratio of each filter at each frequency.

**Figure 14. Difference Transfer Function for 16 bit 48 kHz Data**

When the data precision is increased to 24 bits, the noise does not increase proportionately. It continues to occupy the same relative amplitude with respect to the minimum signal level as shown in Figure 15.

**Figure 15. Difference Transfer Function for 24 bit 48 KHz data**

Figure 16 and 17 show the increase in the noise amplitude when the sample rate is increased to 96KHz and 192 kHz.

These examples emphasize the importance of precision on digital performance.

For these particular filters, 16 or more noise bits are sufficient to preserve all of the input signal SNR for the 1000 and 100 Hz filters at sample rates of 48 kHz and 96 kHz. However, even with 16 noise bits there is some degradation of the input signal SNR at 30 Hz. To preserve all of the input SNR would take 20 noise bits. Similarly, for these filters at sample rates of 192 kHz, 20 noise bits are sufficient to preserve the input SNR for the 100 and 1000 Hz filters. At 192 kHz even with 20 bits, there is a some degradation of the input SNR at 30 Hz.

**Figure 16. 96 kHz Noise Transfer Function**

**Figure 17. 192 kHz Noise Transfer Function**

It is important to note that while these examples are representative of equalization filters at these frequencies with Qs of 6.7, they do not represent the best or worst cases of signal or noise. Differences in filter parameters, center frequencies or filter types types will produce signal and noise characteristics that can be better or worse.

**Zero Input Limit Cycles**

The linear modeling that we have used for assessing the impact of quantization (or round-off) noise is sufficiently accurate for most analysis, except for one notable exception. This exception is the case of zero-input limit cycles, which are a non-linear phenomenon. Zero-input limit cycles produce periodic or "tone" components in the output in response to zero and small amplitude sinusoidal inputs. This behavior is best controlled by careful system design. The most effective means to prevent limit cycles relies upon using cascaded sections of second order Direct Form I filters and magnitude truncation quantization.

**Overflow, Underflow, and Scaling**

Overflow occurs during the calculation of the digital filter response when the result exceeds the largest number that can be represented. There are two principle instances were this occurs. Overflow can occur when the gain of one or more cascaded filters amplify the signal so that it exceeds the largest number that can be represented. An example of this case is when two cascaded filters are used to produce a twin peak response by subtraction, shown in Figure 18. If the positive gain filter precedes the negative gain filter, a full-scale signal input would be amplified by 20dB. This could produce an overflow condition.

**Figure 18. Summation of Filters**

Overflow can also occur within the computation of a single filter stage for those filter types that have large coefficient values even though the filter has only a modest overall signal gain. In the second case, overflow occurs during the intermediate calculations as a result of the signal multiplication with one or more large coefficients.

Several approaches can be employed to prevent overflow:

**Table 2**

The preferred solution is to have additional precision in the processing architecture to accommodate the range of expected range of amplitudes without degrading the SNR performance. This can be accomplished by either adding this additional precision as headroom bits to extend the internal maximum signal amplitude or by adding additional noise bits to reduce the noise floor. The difference between the two solutions is how frequently and when scaling is performed. Figure 19 shows how additional headroom bits permit filters with positive gains to be used to create a desired filter response.

**Figure 19. Importance of Added Precision for Overflow Conditions**

The number of additional bits that are necessary to prevent overflow is dependent upon the permitted gains, filter types and filter parameters for a given sample rate. As previously discussed in the case of cascaded filters, intermediate gains of 18 to 24 dB are common. In the case of large coefficient values, although most filters have coefficient magnitudes that are between 0 and 2, there are a few frequently used audio filters that can have relatively large coefficients. The Treble Shelf is a commonly used audio filter that has relatively high coefficient magnitudes for relatively modest gains and frequencies. The transfer function of a 1 kHz Treble Shelf with a gain of 12 dB is shown in Figure 20.

**Figure 20. Treble Shelf Transfer Function**

The coefficients for the Treble Shelf filter are:

From observation, we can see that to accommodate the signal gain produced by the largest coefficient magnitude will require approximately four additional bits of precision. Although not shown here - subsequent dynamic analyses of the filter behavior indicate that one additional bit is required to prevent overflow for signals with high transient characteristics.

To accommodate both cascaded filters and large coefficient gains, 8 additional bits of precision appear to be sufficient to prevent overflow for most applications. These will be added to the most significant bit positions, as headroom bits. The advantage of adding headroom bits in comparison to noise bits is that the system is able to represent intermediate signal levels that are greater than the maximum input signal magnitude for typical cases. As a result, the headroom bits can eliminate the need to reduce the magnitude of input signal prior to filter processing, and then increase the magnitude of the result after filter processing, in many cases. The input to a filter or cascaded series of filters is reduced in exceptional cases where very large gains are used. The output signal magnitude is reduced when the total gain is greater than one.

Underflow occurs when the result becomes so small that some signal information is irrecoverably lost. As in the case of overflow, this can occur for specific filters that have one or more small coefficients although the overall filter loss is modest. In these cases, the multiplication of the input and small coefficient values can produce an intermediate signal magnitude that is less than the smallest signal magnitude that can be represented with full input signal precision. Two potential approaches may be employed to prevent underflow:

**Table 3**

The preferred solution is to include sufficient precision in the processing architecture to insure that the minimum signal level can be preserved. The proposed solution to add 16 noise bits that is discussed in the section "Finite Precision Arithmetic in Filter Calculations" is sufficient to preserve SNR and prevent underflow for most applications. As we can see, the term "noise bits" is bit of a misnomer because these bits not only reduce the noise floor, but they also preserve signal information.

**This is the end of Part 2. Next week: Part 3, Practical Filter Implementations**