[Part 1 discusses the principal localization mechanisms we use to locate a sound at a point in space.]
Localization, Spaciousness, and Envelopment
The discussion of localization so far has centered on locating a sound at a point in space. Real-world sources may be larger than point sources, and reverberation and diffuse ambiences are meant to be the opposite of localized, that is, diffuse. Some ideas of how to make a simple monaural or 2-channel stereo source into 5 channels are given in Chapter 4.
There are two components to describe spatial sound: spaciousness and envelopment. These terms are often used somewhat interchangeably, but there is a difference. Spaciousness applies to the extent of the space being portrayed, and can be heard over a 2-channel or a 5-channel system. It is controlled principally by the ratio of direct sound to reflections and reverberation. On a 2-channel system, the sound field is usually constrained to being between the loudspeakers, and spaciousness applies to the sense that there is a physical space portrayed between the loudspeakers. The depth dimension is included, but the depth extends only to the area between the loudspeakers.
Envelopment, on the other hand, applies to the sensation of being surrounded by sound, and thus being incorporated into the space of the recording, and it requires a multichannel sound system to reproduce. Two-channel stereo can produce the sensation of looking into a space beyond the loudspeakers; multichannel stereo can produce the sensation of being there.
Lessons from Concert Hall Acoustics
A number of factors have been identified in concert hall acoustics that are useful in understanding the reproduction of sound over multichannel systems:
- The amount of reverberation, and its settings such as reverberation time, time delay before the onset of reflections, amount of diffusion, and so forth are very important to the perception of envelopment, which is generally a desirable property of a sound field.
- Early reflections from the front sides centered on ±55° from straight ahead (with a large tolerance) add to auditory source width (ASW) and are heard as desirable.
- All directions are helpful in the production of the feeling of envelopment, so reverberation returns and multichannel ambiences should apply to all of the channels, with uncorrelated sources for each channel, although some work shows that a difference at the two ears is preferred for contributions to envelopment. Thus the most important 2 channels for reverberation are LS and RS, the next most important are L and R, and C has little importance.
- Research has shown that 5 channels of direct sound are the minimum needed to produce the feeling of envelopment in a diffuse sound field, but the angles for such a feeling do not correspond to the normal setup. They are ±36°, ±108°, and +180° referenced to center at 0°. Of these, the ±36° corresponds perceptually with ±30° and ±108° with ±110°, but there is no back channel in the standard setup. Thus the sensation of complete diffuse envelopment with the standard 5.1 channel setup is problematic.
- Dipole surrounds are useful to improve the sensation of envelopment in reproduction, and are especially suitable for direct/ambient style recording/reproduction.
- Person-to-person differences in sound field preferences are strong. Separable effects include listening level, the initial time delay gap between the direct sound and the fi rst reflection, the subsequent reverberation time, and the difference in the sound field at the two ears.
Rendering 5 Channels over 2: Mixdown
Rendering 5 Channels over 2: Mixdown
Many practical setups are not able to make use of 5 discrete loudspeaker channels. For instance, computer-based monitoring on the desktop most naturally uses two loudspeakers with one on either side of the video monitor. Surround loudspeakers would get in the way in an office environment. For such systems, it is convenient to produce a sound field with the two loudspeakers that approximates 5-channel sound.
This may be accomplished starting with a process called crosstalk cancellation. A signal can be applied to the right loudspeaker that cancels the sound arriving from the left loudspeaker to the right ear, and vice versa. One step in the electronic processing before crosstalk cancellation are signals that represent ear inputs to just the left ear and right ear. At this point in the system it is possible to synthesize the correct HRTFs for the two ears, theoretically for sound arriving from any direction. For instance, if sound is supposed to come from the far left, applying the correct HRTFs for sound to the left ear (earlier and brighter) compared to sound to the right ear (later and duller), makes the sound appear to the left.
This process is limited. Cancellation requires good subtraction of two sound fields, and subtraction or "nulling" is very sensitive to any errors in level, spectrum, or timing. Thus, such systems normally are very sensitive to listener position; they are said to be "sweet spot dependent." Research aims at reducing this sensitivity by finding out just how much of the HRTFs are audible, and working with the data to reduce this effect. Still, there are complications because the head is usually not fixed in location, but can move around, generating dynamic cues as well as static ones.
One way around sweet spot sensitivity, and eliminating crosstalk cancellation, is headphone listening. The problem with headphone listening is that the world rotates as we move our heads left and right, instead of staying fixed as it should. This has been overcome by using head tracking systems to generate information about the angle of the head compared to a reference 3-D representation, and update the HRTFs on the fly.
A major problem of headphone listening is that we have come to learn the world through our own set of HRTFs, and listening through those made as averages may not work. In particular, this can lead to in-head localization, and front"back confusion. It is thought that by using individualized HRTFs these problems could be overcome. Still, headphone listening is clumsy and uncomfortable for the hours that people put in professionally, so this is not an ultimate solution.
In our lab at USC, in the Integrated Media Systems Center, we have used video-based head tracking to alter dynamically the insertion of a phantom stereo image into the two front channels, thus keeping the image centered despite moving left and right, solving one of the major problems of phantoms. This idea is being extended to crosstalk cancellers, and the generation of sound images outside the area between the loudspeakers, in a way which is not so very sweet spot dependent as other systems. A remarkable finding is in demonstrating this to nave and to professional listeners: the nave listener accepts that the phantom center should stay centered as he moves left and right and finds it true, whereas the professional listener, having years of experience in hearing such things, is often disturbed by what they here: they cannot reconcile what they hear with their experience.
Direct mixdown from 5 channels to 2 is performed in simpler equipment, such as digital television receivers equipped to capture 5-channel sound off the air, but play it over two loudspeakers. In such a case, it is desirable to make use of the mixdown features of the transmission system that includes mixdown level parameters for center to the 2 channels, and left surround to left and right surround to right channels.
The available settings at the production end are: center at -3, -4.5, and -6 dB into each left and right; left surround at -3, -6 dB and off into left, and vice versa into right. For center insertion, -3 dB corresponds to what is needed for a power addition, and it applies completely in the reverberant field of a room; -6 dB on the other hand is a phasor addition that might occur if one were completely dominated by direct sound. It must be said that -4.5 dB is the "right" value, ±1.5 dB!
Auralization and Auditory Virtual Reality
Auralization and Auditory Virtual Reality
Auralization is a process of developing the sounds of rooms to be played back over a sound system, so that expensive architecture does not need to be built before hearing potential problems in room acoustics. A computer model of a sound source interacts with a computer model of a room, and then is played, usually over a 2-channel system with crosstalk cancellation described above. In the future, auralization systems may include 5.1-channel reproduction, as in some ways that could lighten the burden on the computer, since the multichannel system renders sound spatially in a more complete way than does a 2-channel system.
The process can also be carried out successfully using scale models, usually about 1:10 scale, using the correct loudspeaker, atmosphere, and scaled miniature head. The frequency translates by the scale factor. Recordings can be made at the resulting ultrasonic frequency and slowed down by 10:1 for playback over headphones.
Auditory Virtual Reality applies to systems that attempt to render the sound of a space wherein the user can move, albeit in a limited way, around the space in a realistic manner. Auralization techniques apply, along with head tracking, to produce a complete auditory experience. Such systems often are biased towards the visual experience, due to projection requirements all around, with little or no space for loudspeakers that do not interrupt the visual experience.
In these cases, multichannel sound from the corners of a cube is sometimes used, although this method is by no means psychoacoustically correct. In theory, it would be possible, with loudspeakers in the corners and applying crosstalk cancellation and HRTF synthesis customized to an individual to get a complete experience of "being there." On the other hand, English mathematician Michael Gerzon has said that it would require one million channels to be able to move around a receiving space and have the sound identical to that moving around a sending space.
The 5.1-channel systems are about three decades old in specialized theater usage, over two decades old in broad application to films in theaters and just a little later in homes, and expanding in broadcasting, supported by millions of home installations that include center and surround loudspeakers that came about due to the home theater phenomena.
Perceptually we know that everyone equipped with normal hearing can hear the difference between mono and stereo, and it is a large difference. Under correct conditions, but much less studied, is the fact that virtually everyone can hear the difference between 2-channel stereo and 5.1channel sound as a significant improvement. The end of this process is not in sight: the number of channels versus improvement to perception of the space dimension is expected to grow and then to find an asymptote, wherein further improvement comes at larger and larger expense.
We cannot be seen to be approaching this asymptote yet with 5.1 channels, but we routinely compare 1, 2, 5.1, and 10.2 channel systems in our laboratory, and while most listeners find the largest difference to be between 2 and 5.1, nonetheless all can hear the difference between 5.1 and 10.2, so we think that the limit of perception has not yet been reached. NHK in Japan has experimented with a 22.2 channel system as well.
One way to look at the next major step beyond 5.1 is that it should be a significant improvement psychoacoustically along the way towards ultimate transparency. Looking at what the 5.1-channel system does well, and what it does poorly, indicates how to deploy additional channels:
- Wide-front channels to reproduce the direction, level, and timing of early reflections in good sounding rooms are useful. An unexpected benefit of the use of such channels was finding that sources panned from front left, through left wide, to left surround worked well with its intermediate phantoms to create the sound of speech moving smoothly around one, something that a 5.1-channel system cannot do.
- A center back channel is useful to "fill in the gap" between surrounds at ±110°, and permits rear imaging as well as improvements to envelopment.
- After the above 3 channels have been added to the standard 5, the horizontal plane should probably be broken in favor of the height sensation, which has been missing from stereo and most multichannel systems. Two channels widely spaced in front of and above the horizontal plane are useful.
- The 0.1-channel may be extended to reproduction from many locations with separate subwoofers to smooth the response. In a large cinema, I conducted a test of six large subs centered under the screen versus four in the corners of the theater, and the four outperformed the six in smoothness of response, and response extension towards both low and high frequencies. One electrical channel on the medium drove all the subwoofers, but the response was improved by using multiple ones spaced apart.
Adding these together makes a 10.2-channel system the logical next candidate. Blauert says in his book Spatial Hearing that 30 channels may be needed for a fixed listening position to give the impression that one is really there, so 5.1 is a huge step on the road above 2 channels, and 10.2 is still a large step along the same road, to auditory nirvana.
Beyond 5.1 (cont.)
The great debate on any new digital media boils down to the size of the bit bucket and the rate at which information can be taken out of the bucket. We have seen the capabilities of emerging media in Chapter 5.
Sample rate, word length, and number of audio channels, plus any compression in use, all affect the information rate. Among these, sample rate and word length have been made flexible in new media so that producers can push the limits to ridiculous levels: if 96 kHz is better than 48, then isn't 192 even better? The dynamic range of 24 bits is 141 dB, which more than covers from the threshold of hearing to the loudest sound found in a survey of up-close, live sound experiences, and at OHSA standards for a one-time instantaneous noise exposure for causing hearing damage.
However, on most emerging media the number of channels is fixed at 5.1 because that is the number that forms the marketplace today. Still, there is upwards pressure on the number of channels already evident.
For instance, Dolby Surround EX and DTS ES provide a 6.1-channel approach so that surround sound, ordinarily restricted to the sides perceptually in theaters, can seem to come from behind as well as the sides. 10.2-channel demonstrations have been held, and have received high praise from expert listeners. The future grows in favor of a larger number of channels, and the techniques learned to render 5.1-channel sound over 2-channel systems can be brought to play for 10.2-channel sound delivered over 5.1-channel systems.
The future is expected to be more flexible than the past. When the CD was introduced, standards had to be fixed since there was no conception of computers capable of carrying information about the program along with it, and then combining that information with information about the sound system, for optimum reproduction.
One idea that may gain favor in time as the capacity of media grows is to transmit a fairly high number of channels, such as 10.2, and along with it metadata about its optimum presentation over a variety of systems, that could range from a home theater, to car stereo, to headphone listening. Even two different presentations of 10.2 could be envisaged: why not have both the best-seat-in-the-house and the middle-of-the-band perspective available from one program file, merely by changing the metadata? We have done this experimentally with a recording of Messiah, producing both the best seat approach for primary listening and the middle of the chorus approach for participation by singing along.
Tips from This Chapter
- Localization of a source by a listener depends on three major effects: the difference in level between the two ears, the difference in time between the two ears, and the complex frequency response caused by the interaction of the sound field with the head and especially the outer ears (head-related transfer functions). Both static and dynamic cues are used in localization.
- The effects of head-related transfer functions of sound incident on the head from different angles call for different equalization when sound sources are panned to the surrounds than when they are panned to the front, if the timbre is to be maintained. Thus, direct sounds panned to the surrounds will probably need a different equalization than if they were panned to the front.
- The minimum audible angle varies around a sphere encompass ing our heads, and is best in front and in the horizontal plane, becoming progressively worse to the sides, rear, and above and below.
- Localization is poor at low frequencies and thus common bass subwoofer systems are perceptually valid.
- Low-frequency enhancement (LFE) (the 0.1 channel) is psycho-acoustically based, delivering greater headroom in a frequency region where hearing is less sensitive.
- Listeners perceive the location of sound from the first arriving direction typically, but this is modified by a variety of effects due to non-delayed or delayed sound from any particular direction. These effects include timbre changes, localization changes, and spaciousness changes.
- Phantom image stereo is fragile with respect to listening position, and has frequency response anomalies.
- Phantom imaging, despite its problems, works more or less in the quadrants in front and behind the listener, but poorly at the sides in 5-channel situations.
- Localization, spaciousness, and envelopment are defined. Methods to produce such sensations are given in Chapter 4. Lessons from concert hall acoustics are given for reverberation, discrete refl ections, directional properties of these, and how they relate to multichannel sound.
- Instruments panned partway between front and surround channels in 5.1-channel sound are subject to image instability and sounding split in two spectrally, so this is not generally a good position to use for primary sources.
Printed with permission from Focal Press, a division of Elsevier. Copyright 2007. "Surround Sound: Up and Running" by Tomlinson Holman. For more information about this title, please visit www.focalpress.com.
Surround Sound: Psychoacoustics - Part 1
How TV audio produces surround sound
Acoustics and Psychoacoustics: Introduction to Sound, Part 1: Pressure waves and sound transmission | Part 2: Sound intensity, power and pressure level | Part 3: Adding sounds together | Part 4: The inverse square law | Part 5: Sound Interactions | Part 6: Sound Interactions (cont.) | Part 7: Time and frequency domains | Part 8: Analyzing spectra
Audio in the 21st Century - Sound
Sound focusing technologies make recent headlines
Principles of 3-D Audio