Psychoacoustics is the field pertaining to perception of sound by human beings. Incorporated within it are the physical interactions that occur between sound fields and the human head, outer ears, and ear canal, and internal mechanisms of both the inner ear transducing sound mechanical energy into electrical nerve impulses and the brain interpreting the signals from the inner ears.
The perceptual hearing mechanisms are quite astonishing, able to tell the difference when the sound input to the two ears is shifted by just 10 µs, and able to hear over 10 octaves of frequency range (visible light covers a range of less than one octave) and over a tremendous dynamic range, say a range of 10 million to one in pressure.
Interestingly, in one view of this perceptual world, hearing operates with an ADC in between the outer sound field and the inner representation of sound for the brain. The inner ear transduces mechanical waves on its basilar membrane, caused by the sound energy, into patterns of nerve firings that are perceived by the brain as sound. The nerve firings are essentially digital in nature, while the waves on the basilar membrane are analog.
Whole reference works such as Jens Blauert's Spatial Hearing and Brian C. J. Moore's An Introduction to the Psychology of Hearing, as well as many journal articles, have been written about the huge variety of effects that affect localization, spaciousness, and other topics of interest. Here we will examine the primary factors that affect multichannel recording and listening.
Principal Localization Mechanisms
Since the frequency range of human hearing is so very large, covering 10 octaves, the human head is either a small appearing object (at low frequencies), or a large one (at high frequencies), compared to the wavelength of the sound waves. At the lowest audible frequencies where the wavelength of sound in air is over 50 ft (15 m), the head appears as a small object and sound waves wrap around the head easily through the process called diffraction.
At the highest audible frequencies, the wavelength is less than 1 in. (25 mm), and the head appears as a large object operating more like a barrier than it does at lower frequencies. Although sound still diffracts around the barrier, there is an "acoustic shadow" generated towards one side for sound originating at the opposite side.
The head is an object with dimensions associated with mid-frequency wavelengths with respect to sound, and this tells us the first fundamental story in perception: one mechanism will not do to cover the full range, as things are so different in various frequency ranges. At low frequencies, the difference in level at the two ears from sound originating anywhere is low, because the waves flow around the head so freely; our heads just aren't a very big object to a 50-ft wave. Since the level differences are small, localization ability would be weak if it were based only on level differences, but another mechanism is at work.
In the low-frequency range, perception relies on the difference in time of arrival at the two ears to "triangulate" direction. This is called the interaural time difference (ITD). You can easily hear this effect by connecting a 36-in. piece of rubber tubing into your two ears and tapping the tubing. Tapped at the center you will hear the tap centered between your ears, and as you move towards one side, the sound will quickly advance towards that side, caused by the time difference between the two ears.
At high frequencies (which are short wavelengths), the head acts more like a barrier, and thus the level at the two ears differs depending on the angle of arrival of the sound at the head. The difference in level between the two ears is called the interaural level difference (ILD). Meanwhile the time difference becomes less important for if it were great confusion would result. The reason for this is that at short wavelengths like 1 in., just moving your head a bit would affect the localization results strongly, and this would have little purpose.
These two mechanisms, time difference at low frequencies and level difference at high ones, account for a large portion of the ability to perceive sound direction. However, we can still hear the difference in direction for sounds that create identical signals at the two ears, since a sound directly in front of us, directly overhead, or directly behind produce identical ILD and ITD.
How then do we distinguish such directions? The pinna or shape and convolutions of the outer ear interact differently for sound coming from various directions, altering the frequency response through a combination of resonances and refl ections unique for each direction, which we come to learn as associated with that direction. Among other things, pinna effects help in the perception of height.
The combination of ILD, ITD, and pinna effects together form a complicated set of responses that vary with the angle between the sound field and the listener's head. For instance, a broadband sound source containing many frequencies sounds brightest (i.e., has the most apparent high frequencies) when coming directly from one side, and slightly "darker" and duller in timbre when coming from the front or back. You can hear this effect by playing pink noise out of a single loudspeaker and rotating your head left and right.
A complex examination of the frequency and time responses for sound fields in the two ear canals coming from a given direction is called a head-related transfer function (HRTF). A thorough set of HRTFs, representing many angles all around a subject or dummy head in frequency and time responses constitute the mechanism by which sound is localized.
Another important factor is that heads are rarely clamped in place (except in experiments!), so there are both static cues, representing the head fixed in space, and dynamic cues, representing the fact that the head is free to move. Dynamic cues are thought to be used to make unambiguous sound location from the front or back, for instance, and to thus resolve "front"back" confusion.