Rendering 5 Channels over 2: Mixdown
Many practical setups are not able to make use of 5 discrete loudspeaker channels. For instance, computer-based monitoring on the desktop most naturally uses two loudspeakers with one on either side of the video monitor. Surround loudspeakers would get in the way in an office environment. For such systems, it is convenient to produce a sound field with the two loudspeakers that approximates 5-channel sound.
This may be accomplished starting with a process called crosstalk cancellation. A signal can be applied to the right loudspeaker that cancels the sound arriving from the left loudspeaker to the right ear, and vice versa. One step in the electronic processing before crosstalk cancellation are signals that represent ear inputs to just the left ear and right ear. At this point in the system it is possible to synthesize the correct HRTFs for the two ears, theoretically for sound arriving from any direction. For instance, if sound is supposed to come from the far left, applying the correct HRTFs for sound to the left ear (earlier and brighter) compared to sound to the right ear (later and duller), makes the sound appear to the left.
This process is limited. Cancellation requires good subtraction of two sound fields, and subtraction or "nulling" is very sensitive to any errors in level, spectrum, or timing. Thus, such systems normally are very sensitive to listener position; they are said to be "sweet spot dependent." Research aims at reducing this sensitivity by finding out just how much of the HRTFs are audible, and working with the data to reduce this effect. Still, there are complications because the head is usually not fixed in location, but can move around, generating dynamic cues as well as static ones.
One way around sweet spot sensitivity, and eliminating crosstalk cancellation, is headphone listening. The problem with headphone listening is that the world rotates as we move our heads left and right, instead of staying fixed as it should. This has been overcome by using head tracking systems to generate information about the angle of the head compared to a reference 3-D representation, and update the HRTFs on the fly.
A major problem of headphone listening is that we have come to learn the world through our own set of HRTFs, and listening through those made as averages may not work. In particular, this can lead to in-head localization, and front"back confusion. It is thought that by using individualized HRTFs these problems could be overcome. Still, headphone listening is clumsy and uncomfortable for the hours that people put in professionally, so this is not an ultimate solution.
In our lab at USC, in the Integrated Media Systems Center, we have used video-based head tracking to alter dynamically the insertion of a phantom stereo image into the two front channels, thus keeping the image centered despite moving left and right, solving one of the major problems of phantoms. This idea is being extended to crosstalk cancellers, and the generation of sound images outside the area between the loudspeakers, in a way which is not so very sweet spot dependent as other systems. A remarkable finding is in demonstrating this to nave and to professional listeners: the nave listener accepts that the phantom center should stay centered as he moves left and right and finds it true, whereas the professional listener, having years of experience in hearing such things, is often disturbed by what they here: they cannot reconcile what they hear with their experience.
Direct mixdown from 5 channels to 2 is performed in simpler equipment, such as digital television receivers equipped to capture 5-channel sound off the air, but play it over two loudspeakers. In such a case, it is desirable to make use of the mixdown features of the transmission system that includes mixdown level parameters for center to the 2 channels, and left surround to left and right surround to right channels.
The available settings at the production end are: center at -3, -4.5, and -6 dB into each left and right; left surround at -3, -6 dB and off into left, and vice versa into right. For center insertion, -3 dB corresponds to what is needed for a power addition, and it applies completely in the reverberant field of a room; -6 dB on the other hand is a phasor addition that might occur if one were completely dominated by direct sound. It must be said that -4.5 dB is the "right" value, ±1.5 dB!