Given the long history of audio and video data transmission over various networking topologies, it would seem that these two technologies were made to go together. Indeed, given ultimate freedom to choose-bandwidth constraints aside-consumers would prefer to have all of their audio/video content delivered to them on the wire. Why buy shiny metal disks (CDs and DVDs) and spin them on mechanical devices resembling turntables from 50 years ago, still suffering from skips and scratches, when you could just have the same content transferred to the consumer directly?
But anyone who has actually tried to build such systems, especially when they involve wide-area networks such as the Internet, knows acutely that delivering on that scenario is a very difficult challenge. Networks fast enough and with the needed quality to deliver uncompressed broadcast-quality video together with CD-quality audio are simply too expensive to deliver en masse. After all, we are dealing with data rates approaching 200 Mbits/second, which is beyond the capabilities of most LANs, let alone the general Internet.
With the advances of audio and video compression and broadband data transmission there is little reason to store and transmit the source material in its entirety. But a quick look at the compression rates necessary for mainstream Internet delivery shows what at first seems to be a hopeless situation. For example, delivering a video signal to the typical "28k-modem" user means dealing with about 22 kbits/s of total data for both audio and video (the rest of the bandwidth from 22k to 28k is usually reserved to accommodate network overhead and the need for extra headroom when recovering from congestion). For example, allocating 5 kbits to audio leaves a paltry 17 kbits for video.
This requires an amazingly high compression ratio of nearly 8,700:1. Of course, the video image can be shrunk (subsampled) and frame rates can be modified to reduce this ratio, but the magnitude of the problem remains nevertheless.
Even though audio is thought to be an easier problem to solve, it too is subjected to high compression ratios of 63:1 at 22 kbits/s. This may not seem like a high ratio but the ear is a far less forgiving instrument than the eye, making it impossible to pass off such audio as "CD quality."
Even after you get past the pure compression issues, there are additional challenges when the compressed data is transmitted over the general Internet. With no built-in Quality of Service today, there is no assurance that the connected modem rate of 22 kbits/s will remain constant during, say, a five-minute music video, let alone a full-length movie. Bandwidth drops can result in annoying interruptions of audio and video. While prebuffering can help, it increases latency of the delivered content, which is undesirable.
Although the problems are daunting, there are solutions. One is to use high-performance compression algorithms to increase the quality given a specific bit rate; a second is to build an intelligent full-duplex system to deal with network throughput fluctuations; third, avoid real-time transmission and deliver the data as a file to the client to be played back later.
To get the best quality, it is imperative to use a compression system designed to deliver good quality at the required bit rates. For example, MPEG-2 is an excellent compression scheme, which can deliver good quality video and audio. But it only does so at very high bit rates, evidenced by the 10-Mbit/s transfer rate of DVD and 19-Mbit/s rate of HDTV. Try to use this at the "broadband" speed of 300 kbits/s over a digital subscriber line (DSL) and you will get extremely low quality. The same can be said of audio-compression algorithms such as MPEG-1 Layer 3 (MP3) which was designed for good performance at rates exceeding 128 kbits/s. One would be lucky to simply get "AM radio quality" at POTS modem rates, which dashes any hope of having users abandon their radios for the Internet.
Fortunately, the need to deliver excellent quality at low bit rates has resulted in a number of new and innovative algorithms, some of which are developed by standards groups while others use advanced proprietary techniques.
On the video front, MPEG-4 is leading the way by producing excellent quality at astonishingly low bit rates. For example, the enhanced implementation of MPEG-4 in Microsoft Windows Media is able to reproduce as many as 10 to 12 frames/s using 160 x 120 resolution at 17 kbits/s. The same technology can easily deliver near-VHS quality of 320 x 240 at 30 frames/s at just 300 kbits/s, encroaching on the domain of MPEG-1, which generally requires bit rates of 1.1 Mbits/s and higher. With the rapidly growing base of broadband connections at those rates, one can start looking at entertainment applications that were once the domain of leased-line, cable or satellite-based systems.
On the audio side, standards-compliant compression systems that perform well at low bit rates simply don't exist. While work is being done in the MPEG-4 committee and elsewhere, no commercially standards-compliant audio codec exists that can produce high-fidelity music at modem rates.
Once you have high-quality compressed content, the challenge becomes managing the transmission link. With an average of seven routers between the source and the destination on the Internet, it is simply not realistic to assume fixed, guaranteed bandwidth even on a "digital" DSL or cable modem connection. Needless to say, the situation is even worse over an analog modem. And as mentioned before, buffering the data before playback, while useful, cannot be done aggressively as it increases the initial latency.
The solution then is to use a dedicated "streaming" server. Using an end-to-end control-feedback system, the client and the server can produce the optimal experience for the user given the network bandwidth at the moment. For example, the Microsoft Windows Media Player can instruct the server to reduce its video transmission rate to deal with bandwidth drop. So, going from 300 kbits/s to 100 kbits/s, for example, will result in dropping the frame rate from 30 to 15, but without any interruption of the stream. Extreme bandwidth drops result in the system's pausing the video but keeping the audio transmission. This is important as the audio usually carries far more information than the video and users tend to consider audio breaks a far more serious degradation of service than video ones.
When the efficiency of the video and audio compression and sophisticated network management features of the streaming servers are combined, a remarkably capable system for delivering entertainment-quality content to users is produced for the cost of a dedicated network system. Use of traditional computer systems for the client and server further reduces the cost structure of these systems, bringing the capability to most content and service providers.
Given audio's lower bandwidth requirement and smaller clip size (four-minute song vs. two-hour movie), it is reasonable to also look at transferring files to the user's PC or music devices to be played back instead of streaming. Indeed, people tend to want to listen to their music more than once, making it a natural for some form of caching or local storage.
Fortunately, the excellent audio-compression engines necessary for low-bit-rate transmission can also be used in this application. The result is reduced file size, which reduces the download time and increases storage capacity on the user's PC or playback device. For example, a four-minute song encoded at 64 kbits/s produces a file that is 1.9 Mbytes vs. 42 Mbytes for the uncompressed one on the CD. To put this in context, downloading the 64k version will take less than about 11 minutes vs. more than four hours for the uncompressed one. Yet to most consumers the quality will be very close to the original. This is made possible by encoding only the psycho-acoustical properties that the average person is able to register and discern in the original uncompressed file while maintaining high fidelity.
Once you allow content to be downloaded to the user's system, you get into many complex matters such as copyright management. Fortunately, systems such Microsoft's Windows Digital Rights Management let content owners optionally encrypt and "lock" downloaded content (audio, video or both) to the user's PC so that additional copies can not be made without authorization.
Though such systems are nothing new in the consumer electronics world-witness the encrypted nature of DVD or Macrovision copy protection on VHS tapes-they are a relatively new development in the PC multimedia arena. Although some consumers may balk at any idea of copy protection, it is a necessary component of any modern digital music and video distribution to protect the authors. Copy-protection technology also allows new business models on the Web, such as the Windows Media Pay-Per-View solutions system, in which users pay for access to video streams such as live concerts or sporting events.
See related chart