Design Article
Data compression for high-speed DSP, part 2
Al Wegener
7/20/2008 12:00 PM EDT
Comparison: Lempel-Ziv for high-speed sampled data
The most well-known lossless compression algorithms, invented by Abraham Lempel and Jacob Ziv (Lempel-Ziv, or LZ) in 1977 and 1978, were designed to compress character-based computer files that required lossless compression. In such applications, even a one-bit error is unacceptable.
Many readers will already be familiar with LZ's dictionary-based compression algorithm. A new compression algorithm by Burrows and Wheeler (BW) and improvements by Welch (LZW) incrementally improved upon LZ, improving the compression ratio as measured by a set of test files, such as the famous Calgary and Canterbury corpuses. Two LZ-based file compression programs, WinZIP and PKzip, are among the most downloaded shareware programs on the Internet. LZ-based lossless compression algorithms achieve about 2:1 compression on a wide variety of computer files, for which LZ was designed.
All lossless compression algorithms for computer files exploit character redundancy and locality. LZ-based algorithms build a dictionary of frequently-used letters, words, and phrases and then replace new instances of these items with a pointer to the corresponding dictionary entry for the repeated item. Two fundamental problems occur when trying to apply dictionary-based techniques to sampled data. First, sampled data comes in many bit widths, not just 8-bit characters. ADCs and DACs offer sample widths between 4- and 24-bits per sample. Thus the fundamental unit of analysis for dictionary-based schemes (the 8-bit character) only maps as intended to 8-bit ADC and DAC samples. Second, all sampled data is imperfect, i.e. ADCs digitize not only the signal or signals of interest, but also the noise present in the system. While the 4-sample data stream {-483, -92, 120, 588} is similar to a second stream containing the samples {-477, -91, 117, 589}, dictionary-based lossless algorithms would treat samples in these streams as two completely independent dictionary entries, not as two numerically similar sequences.
As illustrated in Table 1, LZ-based algorithms don't compress high-speed sampled data very well (we used WinZIP to generate the LZ results in Table 1). High-speed sampled data contains little character-based repetition for dictionary-based compression algorithms to exploit. This is because signal phases and amplitudes are intentionally modulated and because signal SNR varies over time. Interestingly, and as expected, WinZIP performed best on Table 1's 8-bit signals (Gamma ray and SerDes).
When estimating LZ algorithm complexity, MB/sec is the relevant metric, since lossless compression for computer files is now performed almost exclusively in software. On today's CPUs, WinZIP operates at just a few MB/sec. Using the same Intel CPUs, Samplify operates about 20x faster than WinZIP.
A final drawback of LZ-based algorithms for high-speed sampled data is their lack of support for lossy compression. With Samplify, lossy compression is integrated into the algorithm. To summarize, ASCII character-based lossless compression algorithms are intended to compress 8-bit character strings, do not compress sampled data very well or nearly fast enough, and do not support lossy compression.

Table 1: Lossless compression algorithms designed for computer files do not offer very good compression on sampled data, nor can they be scaled to the I/O sample rates of high-speed ADCs, DACs, FPGAs, and ASICs used in signal processing applications.
Comparison: Consumer compression algorithms for sampled data
A key differentiator between Samplify compression and other popular compression algorithms such as MP3, CELP, JPEG, and H.264 is having predictable and controllable spectral distortion. During lossy operation, Samplify's distortion is white (spectrally flat). All consumer-oriented compression algorithms use psychoacoustic properties of human hearing or exploit known weaknesses of human vision to achieve their "perceptually lossless" results. Because "perceptually lossless" is good enough for audio and video applications, none of these algorithms includes a lossless compression mode. However, properties of human hearing and vision cannot be exploited for the signals that Samplify aims to compress. Wireless signals (CDMA, OFDM) do not have three color planes, so JPEG and H.264 cannot be applied in the intended manner. Similarly, ultrasound and raw CT signals will never be listened to, so CELP formants and MP3 critical bands can neither be identified nor allocated.
The most popular and well-known compression algorithms for consumer applications are designed to operate at specific sample rates. For instance, since human hearing is limited to a frequency response of 20 kHz, there is no reason to sample higher than 44 ksamp/sec or up to 192 ksamp/sec for high-end audiophiles. It is difficult to imagine how MP3 and related audio compression algorithms could be scaled to operate at even 10 Msamp/sec (a 200x speed-up), let alone 1 Gsamp/sec – the sample rates needed for high-speed DSP systems. And even if such a scale-up were possible, the unwelcome non-linear frequency response of speech and audio compression algorithms would still burden their results. Similar comments apply about the complexity of scaling image compression algorithms such as JPEG and H.264: pixels and color space mappings simply do not apply to 3G wireless, WiMax, ultrasound, or CT signals.
To summarize, existing consumer-oriented compression algorithms do not offer lossless compression, introduce unacceptable (non-flat) frequency distortions, and cannot scale to the high sample rates required for high-speed DSP systems operating at 10+ Msamp/sec.



