How we receive and process sound from a point in space is described by a head related transfer function (HRTF). The HRTF is a frequency response function of the ear and describes how acoustic signals are scattered by the human body and filtered by the pinna (external ear) and ear canal before the sound reaches the eardrum. The circularly-asymmetric external ears (pinnae) form specially shaped "antennas" that causes location dependent and frequency dependent "filtering" of the sound reaching the eardrums, especially at higher frequencies.
Basically, HRTF's are the Fourier transforms of static measurements of the left and right ear impulse responses (called head related impulse responses) of sound received from different distances and directions. The interaural level differences (ILD) and interaural time differences (ITD) are derived through the sounds heard by each ear (see Figure 1).
The HRTF for each person is different since there can be significant differences in individual hearing capability as well as physical body characteristics. However, several HRTF measurement databases use broad classifications such as male or female and young or old, and are generally used in consumer audio applications where HRTF's are required.
Such measurements can take a lot of time because humans are unable to hold their heads in a fixed position for very long periods of time, resulting in imprecise data. For this reason, some HRTF databases are created by taking measurements only from "dummy" heads modeled on an average human head and ears to avoid head movement errors.
Our ears can locate sound directions in three dimensions (front/back, above/below, and either side) to an angular resolution of approximately three degrees, and we can also estimate distances. We're able to do this because our brain, inner ear, and external ear (pinna) utilize cues derived from one ear (monaural cues), and by comparing cues received at both ears (difference cues or binaural cues). In the natural environment, each person has learned the accuracy of their sound location ability (their own HRTF data) through trial and error, and lifelong experience, and has effectively compensated for their body shape and composition.
Figure 1. Head related transfer function localizes an audio source in space by considering that sound waves enter the ears at different times and with varying intensities due to the difference in distance between ears.
It is possible to synthesize binaural sounds that appear to originate from any particular point in space around the listener by applying the appropriate filters to existing audio signals and combining the sound with HRTF information, resulting in left and right channel sound specifically designed for each ear. Similar to left and right sound separation experienced using headphones, each ear hears only what it should hear.
By contrast, a stereo audio signal played through headphones appears to emanate in an area restricted to a line between the ears. This difference between ordinary stereo and spatial audio gives rise to the terms "3D sound," or "virtual surround sound."
Creating a specific spatial audio effect is more difficult when using two loudspeakers at some distance from the listener, since the sound from any one channel can be heard by both ears, causing audio crosstalk. Crosstalk cancellation can be achieved by using destructive wave interference to cancel unwanted signals. As seen in Figure 2, anti-waves - or cancellation waves - sent to the right ear cancel unwanted left channel audio signals. The same happens to unwanted right channel signals at the left ear. The result is distinct right- and left-channel sound enhancement areas that promote an elevated sense of audio placement in 3D space.
This additional processing of the audio in each channel has to be done to eliminate or reduce the effects of crosstalk, as well as taking into account the possible effects on the sound by the listener's position (angle and distance from loudspeakers), ear sensitivity and shape (due to age, sex, ethnicity), head/torso size and mass, and the localized physical environment (presence or absence of reflective and/or absorptive materials).
Figure 2: Crosstalk cancellation uses destructive wave interference to cancel unwanted signals.
All these factors determine how or if a listener can accurately identify where a sound has originated. For these reasons, spatial audio creation techniques for loudspeaker systems must also include acoustic beamforming.