Design Article
Lossless Compression of Audio Data - Part 1
Robert C. Maher
11/21/2007 2:35 PM EST
Lossless data compression of digital audio signals is useful when it is necessary to minimize the storage space or transmission bandwidth of audio data while still maintaining archival quality. Available techniques for lossless audio compression, or lossless audio packing, generally employ an adaptive waveform predictor with a variable-rate entropy coding of the residual, such as Huffman or
Golomb-Rice coding. The amount of data compression can vary considerably from one audio waveform to another, but ratios of less than 3 are typical. Several freeware, shareware, and proprietary commercial lossless audio packing programs are available.
12.1 INTRODUCTION
The Internet is increasingly being used as a means to deliver audio content to end-users for entertainment, education, and commerce. It is clearly advantageous to minimize the time required to download an audio data file and the storage capacity required to hold it. Moreover, the expectations of end-users with regard to signal quality, number of audio channels, meta-data such as song lyrics, and similar additional features provide incentives to compress the audio data.
12.1.1 Background
In the past decade there have been significant breakthroughs in audio data compression using lossy perceptual coding [1]. These techniques lower the bit rate required to represent the signal by establishing perceptual error criteria, meaning that a model of human hearing perception is used to guide the elimination of excess bits that can be either reconstructed (redundancy in the signal) or ignored (inaudible components in the signal). Such systems have been demonstrated with "perceptually lossless" performance, i.e., trained listeners cannot distinguish the reconstructed signal from the original with better than chance probability, but the reconstructed waveform may be significantly different from the original signal.
Perceptually lossless or near-lossless coding is appropriate for many practical applications and is widely deployed in commercial products and international standards. For example, as of this writing millions of people are regularly sharing audio data files compressed with the MPEG 1 Layer 3 (MP3) standard, and most DVD video discs carry their soundtracks encoded with Dolby Digital multichannel lossy audio compression.
There are a variety of applications, however, in which lossy audio compression is inappropriate or unacceptable. For example, audio recordings that are to be stored for archival purposes must be recoverable bit-for-bit without any degradation, and so lossless compression is required. Similarly, consumers of audio content may choose to purchase and download a losslessly compressed file that provides quality identical to the same music purchased on a conventional audio CD.
Furthermore, lossy compression techniques are generally not amenable to situations in which the signal must pass through a series of several encode/decode operations, known as tandem encode/decode cycles. This can occur in professional studio applications where multiple audio tracks are additively mixed together or passed through audio effects devices such as EQ filters or reverberators and then reencoded. Tandeming can also occur in broadcasting or content distribution when the signal must be changed from one data format to another or sent through several stages of intermediate storage. Audible degradations due to lossy compression will accumulate with each encode/decode sequence, and this may be undesirable.
12.1.2 Expectations
Lossy audio compression such as MP3 is appropriate for situations in which it is necessary to specify the best perceived audio quality at a specific, guaranteed bit rate. Lossless compression, on the other hand, is required to obtain the lowest possible bit rate while maintaining perfect signal reconstruction.
It is important to be aware that the bit rate required to represent an audio signal losslessly will vary significantly from one waveform to another depending on the amount of redundancy present in the signal. For example, a trivial file containing all "zero" samples (perfect silence) would compress down to an integer representing the number of samples in the file, while an audio signal consisting of white noise would thwart any attempt at redundancy removal. Thus, we must be prepared to accept results in which the bit rate of the losslessly compressed data is not significantly reduced compared to the original rate.
Because most audio signals of interest have temporal and spectral properties that vary with time, it is expected that the lossless compression technique will need to adapt to the short-term signal characteristics. The time varying signal behavior will imply that the instantaneous bit rate required to represent the compressed signal will vary with time, too. In some applications, such as storing an audio data file on a hard disk, the major concern is the average bit rate since the size of the resulting file is to be minimized. In other applications, most notably in systems requiting real-time transmission of the audio data or data retrieval from a fixed-bandwidth storage device such as DVD, there may also be concern about the peak bit rate of the compressed data.
A plethora of bit resolutions, sample rates, and multiple channel formats are in use or have been proposed for recording and distribution of audio data. This means that any technique proposed for lossless compression should be designed to handle pulse code modulation (PCM) audio samples with 8- to 32-bit resolution, sample rates up to 192 kHz, and perhaps six or more audio channels. In fact, many of the available commercial lossless compression methods include special features to optimize their performance to the particular format details of the audio data [5].
Next: Terminology



