[Part 1 offers an introduction and overview of lossless audio data compression, and a discussion of some of its key principles.]
12.2.3 Multiple-Channel Redundancy
Common audio data formats such as the Compact Disc generally store a two-channel (stereo) audio signal. The left and fight channel data streams are stored entirely independently. Various new data formats for DVD and other distribution systems will allow four, or six, or perhaps more separate audio channels for use in multichannel surround loudspeaker playback systems.
In any case, it is common to find that there is at least some correlation between the two channels in a stereo audio signal or among the various channels in a multichannel recording, and therefore it may be possible to obtain better compression by operating on two or more channels together [3, 6]. This is referred to as joint stereo coding or interchannel coding.
Joint coding is particularly useful for audio signals in which two or more of the channels contain the same signal, such as a dual-mono program, or when one or more of the channels is completely silent for all or most of the recording. Some systems take advantage of interchannel correlation by using a so-called matrixing operation to encode L and (L - R), or perhaps (L + R) and (L - R), which is similar to FM broadcast stereo multiplexing. If the L and R channels are identical or nearly identical, the difference signal (L - R) will be small and relatively easy to encode.
In many cases of practical interest audio signals exhibit a useful degree of sample-to-sample correlation. That is, we may be able to predict the value of the next audio sample based on knowledge of one or more preceding samples [7-9].
Another way of stating this is that if we can develop a probability density function (PDF) for the next sample based on our observation of previous samples, and if this PDF is non-uniform and concentrated around a mean value, we will benefit by storing only (a) the minimum information required for the decoder to re-create the same signal es- timate and (b) the error signal, or residual, giving the sample-by-sample discrepancy between the predicted signal and the actual signal value. If the prediction is very close to the actual value, the number of bits required to encode the residual will be fewer than the original PCM representation. In practice the signal estimate is obtained using some sort of adaptive linear prediction.
A general model of a signal predictor is shown in Fig. 12.3a [3, 7]. The input sequence of audio samples, x[n], serves as the input to the prediction filter, P(z), creating the signal estimate [n]. The prediction is then subtracted from the input, yielding the error residual signal, e[n].
In a practical prediction system it is necessary to consider the numerical precision of the filter P(z), since the coefficient multiplications within the filter can result in more significant bits in [n] than in the input signal x[n]. This is undesirable, of course, since our interest is in minimizing the output bit rate, not adding more bits of precision.
The typical solution is to quantize the predicted value to the same bit width as the input signal, usually via simple truncation . This is indicated in Fig. 12.3b. As will be mentioned later in the section on practical design issues, special care must be taken to ensure that the compression and decompression calculations occur identically on whatever hardware platform is used.
An alternative predictor structure that is popular for use in lossless audio compression is shown in Fig. 12.4 . This structure incorporates two finite impulse response (FIR) filters, A(z) and
Figure 12.3: (a) Basic predictor structure for lossless encoder/decoder. (b) Structure with explicit quantization (truncation) to original input bit width.
Figure 12.4: Alternative predictor structure.
B(z), in a feed-forward and feed-back arrangement similar to an infinite impulse response (IIR) Direct Form I digital filter structure, but with an explicit quantization prior to the summing node. Note also that filter B(z) can be designated as a null filter, leaving the straightforward FIR predictor of Fig. 12.3b.
The advantage of selecting A(z) and B(z) to be FIR filters is that the coefficients can be quantized easily to integers with short word lengths, thereby making an integer implementation possible on essentially any hardware. Because this structure is intended for signal prediction and not to approximate a specific IIR filter transfer function, the details of the coefficients in A(z) and B(z) can be defined largely for numerical convenience rather than transfer function precision.
The use of an FIR linear predictor (B(z) = 0) is quite common for speech and audio coding, and the filter coefficients for A(z) are determined in order to minimize the mean-square value of the residual e[n] using a standard linear predictive coding (LPC) algorithm [7, 8]. No such convenient coefficient determination algorithm is available for the IIR predictor (B(z) ≠ 0), which limits the widespread use of the adaptive IIR version.
In lossless compression algorithms that utilize IIR predictors it is common to have multiple sets of fixed coefficients from which the encoder chooses the set that provides the best results (minimum mean-square error) on the current block of audio samples [3, 5]. In fact, several popular lossless compression algorithms using FIR prediction filters also include sets of fixed FIR coefficients in order to avoid the computational cost of calculating the optimum LPC results .
Once the encoder has determined the prediction filter coefficients to use on the current block, this information must be conveyed to the decoder so that the signal can be recovered losslessly. If an LPC algorithm is used in the encoder, the coefficients themselves must be sent to the decoder. On the other hand, if the encoder chooses the coefficients from among a fixed set of filters, the encoder needs only to send an index value indicating which coefficient set was used.
The choice of predictor type (e.g., FIR vs IIR), predictor order, and adaptation strategy has been studied rather extensively in the literature . Several of the lossless compression packages use a low-order linear predictor (order 3 or 4), while some others use predictors up to order 10. It is interesting to discover that there generally appears to be little additional benefit to the high-order predictors, and in some cases the low-order predictor actually performs better. This may seem counterintuitive, but keep in mind that there often is no reason to expect that an arbitrary audio signal shouldfit a predictable pattern, especially if the signal is a complex combination of sources such as a recording of a musical ensemble.