Design Article
Comment
kdboyce
bcarso
Thanks for your clarifications Ken! And yes, I wasn't paying too much attention ...
Generating spatial audio from portable products - Part 1: Spatial audio basics
Ken Boyce, Audio Technologist, Texas Instruments Silicon Valley Labs
1/3/2012 12:20 PM EST
How we receive and process sound from a point in space is described by a head related transfer function (HRTF). The HRTF is a frequency response function of the ear and describes how acoustic signals are scattered by the human body and filtered by the pinna (external ear) and ear canal before the sound reaches the eardrum. The circularly-asymmetric external ears (pinnae) form specially shaped "antennas" that causes location dependent and frequency dependent "filtering" of the sound reaching the eardrums, especially at higher frequencies.
Basically, HRTF's are the Fourier transforms of static measurements of the left and right ear impulse responses (called head related impulse responses) of sound received from different distances and directions. The interaural level differences (ILD) and interaural time differences (ITD) are derived through the sounds heard by each ear (see Figure 1).
The HRTF for each person is different since there can be significant differences in individual hearing capability as well as physical body characteristics. However, several HRTF measurement databases use broad classifications such as male or female and young or old, and are generally used in consumer audio applications where HRTF's are required.
Such measurements can take a lot of time because humans are unable to hold their heads in a fixed position for very long periods of time, resulting in imprecise data. For this reason, some HRTF databases are created by taking measurements only from "dummy" heads modeled on an average human head and ears to avoid head movement errors.
Our ears can locate sound directions in three dimensions (front/back, above/below, and either side) to an angular resolution of approximately three degrees, and we can also estimate distances. We're able to do this because our brain, inner ear, and external ear (pinna) utilize cues derived from one ear (monaural cues), and by comparing cues received at both ears (difference cues or binaural cues). In the natural environment, each person has learned the accuracy of their sound location ability (their own HRTF data) through trial and error, and lifelong experience, and has effectively compensated for their body shape and composition.

It is possible to synthesize binaural sounds that appear to originate from any particular point in space around the listener by applying the appropriate filters to existing audio signals and combining the sound with HRTF information, resulting in left and right channel sound specifically designed for each ear. Similar to left and right sound separation experienced using headphones, each ear hears only what it should hear.
By contrast, a stereo audio signal played through headphones appears to emanate in an area restricted to a line between the ears. This difference between ordinary stereo and spatial audio gives rise to the terms "3D sound," or "virtual surround sound."
Audio crosstalk
Creating a specific spatial audio effect is more difficult when using two loudspeakers at some distance from the listener, since the sound from any one channel can be heard by both ears, causing audio crosstalk. Crosstalk cancellation can be achieved by using destructive wave interference to cancel unwanted signals. As seen in Figure 2, anti-waves - or cancellation waves - sent to the right ear cancel unwanted left channel audio signals. The same happens to unwanted right channel signals at the left ear. The result is distinct right- and left-channel sound enhancement areas that promote an elevated sense of audio placement in 3D space.
This additional processing of the audio in each channel has to be done to eliminate or reduce the effects of crosstalk, as well as taking into account the possible effects on the sound by the listener's position (angle and distance from loudspeakers), ear sensitivity and shape (due to age, sex, ethnicity), head/torso size and mass, and the localized physical environment (presence or absence of reflective and/or absorptive materials).

All these factors determine how or if a listener can accurately identify where a sound has originated. For these reasons, spatial audio creation techniques for loudspeaker systems must also include acoustic beamforming.
Next: Acoustic Beamforming


EREBUS
1/3/2012 4:47 PM EST
Having tested some of these algorithms twenty years ago, I can honestly say that the technology is amazing. The advancement in inner cranial and exo cranial perception has been a long time coming from the research labs to commercial products. I think everyone will be amazed with how sound can be manipulated to put you into a 3-D sound environment to match the 3-D video available.
This is really neat stuff.
Sign in to Reply
bcarso
1/4/2012 2:13 PM EST
Although I'm a big fan of such processing in certain applications (for example nearfield "personal" monitors, one needs to note the importance of the room in more traditional audio settings, which generally is neglected and for which is much more difficult to account. And it tends to screw up spatial processing of this sort.
But merely treating the room as a negative, as is suggested in this article, flies in the face of listener preferences and other psychoacoustic results, some of which are sometimes subtle but rather well-understood now, despite continued misunderstandings in the popular press and elsewhere. In particular, it is misleading to state, as the author does, that reflections per se result in reduced "clarity". In fact, however counterintuitive, the reality is quite the contrary. See Toole's book Sound Reproduction for a comprehensive treatment of these issues.
Brad Wood
Sign in to Reply
zeeglen
1/4/2012 5:12 PM EST
Good observation Bcarso. Here is a fun experiment for a rainy day:
Start with the typical residential room with drywall (acoustically reflective) walls and ceilings and maybe some curtains and floor carpet acoustic damping for good luck.
Drive a single speaker with a sine wave about 300-400 Hz or so at comfortable loudness.
Plug one ear with an earplug.
Move around and see how many places exist where you can completely null the perceived audio tone with the position of the unplugged ear. Caused by reflected (standing) waves and destructive interference, the wavelength IIRC of 350 Hz is somewhere about a meter.
This effect exists with normal listening with both ears, just harder to notice.
Good article, but room acoustics do play a big part.
Sign in to Reply
bcarso
1/4/2012 8:47 PM EST
Indeed. If we were to take the "reflections diminish clarity" assertion seriously, we'd all be seeking out anechoic spaces for our listening enjoyment. For those of you who have had access to a good chamber, you know how absurd that would be.
Sign in to Reply
kdboyce
1/5/2012 1:02 AM EST
There was no mention of anechoic chambers in the article, and I definitely do not advocate them as a desirable listening environment. . They are, however, a good environment for making specific audio measurements. Anechoic chambers are also known as “dead rooms” while the most pleasant listening environment is a “live room”. The terms refer to the fact that a live room sounds more realistic while a dead room is, well… dead – no life.
There is no perfect technique to accurately reproduce sound. All methods have drawbacks and what one enjoys another will not. From a psychoacoustic point of view, practically all reproduced sounds we listen to don’t faithfully reproduce the original and are enhanced in some way for our listening enjoyment (or tolerance). This does not stop people from using the techniques, however. Witness a whole generation that only knows MP3 music quality and you know what I mean.
Sign in to Reply
kdboyce
1/4/2012 10:58 PM EST
I suppose the best statement would have been: “While reflections make it easier to hear from any position, the reflections arrive at different times and intensities than the original signal and CAN result in sound that lacks clarity.”
The intent of the statement in the article was not to explicitly “treat the room as a negative”. Rather it was to point out that the room does affect clarity, usually negatively. Because of this fact, much effort, including material in O’Toole’s excellent book, is expended to improve room listening environments for consumers.
As a practical matter, reflections (or echoes) are a very important part of acoustics as they help us estimate the direction, and distance of sound objects. But no reflecting surface is perfectly flat and that fact alone will blur the reflection, e.g. “smear” the sound. It becomes more complex as the number of reflecting surfaces increase.
Reflections can obscure the true source of a sound under certain conditions and reduce intelligibility. Longer echoes are generally a less offensive than shorter echoes. Echoes can cause phase interference which results in either reinforcement or partial cancellation of a sound at a particular frequency. For music, this can occur over a wider frequency range than for speech. When similar complex sounds in different phases interact, the effect is called comb filtering and is almost always undesirable as it obscures detail and harms intelligibility.
Sign in to Reply
azskibum
1/4/2012 4:12 PM EST
Nice overview Ken. I'm looking forward to reading Part 2.
Sign in to Reply
Frank Eory
1/4/2012 4:15 PM EST
Oops, wrong login -- that was me. Again, thanks for a good article Ken.
Sign in to Reply
kdboyce
1/5/2012 1:31 AM EST
I want to address a couple of other points Brad made.
1. Yes... the room situation can screw up spatial audio effects, especially if you are using the technique to try to reproduce a 5.1 or 7.1 audio system. In this case you may want to do some specific beam forming in order to create wanted reflections. Some makers of such systems also advocate having a microphone at the desired listening position, and use the data it picks up as a means of measuring the room characteristics and adjusting the algorithms accordingly.
However, the article premise was on small product use cases where space constraints did not permit even a reasonable stereo sound field. In this case, the spatial audio processing could enhance the listener experience and it was not intended for a multi-person audience.
2. As a closet musician, I am were aware of the improvement of synthesized sounds that can be accomplished with the judicious use of early reflections and reverberation as well as equalization. To me, a "dead" piano (or 'dry' as they say in music lingo) is useless - not real sounding. So reverb and reflection echoes are added (make the sound 'wet') to make the piano sound more realistic as if it were in a real room. All 'dry' or all 'wet' makes Jack a dull boy, so there is always an optimum mix of 'dry' to 'wet'. Whether this is accomplished totally electronically or partially by the room makes no difference so long as the resultant sound sounds more realistic to the average person. If so, then they can enjoy it. If not.... there is always complaints.
Lastly, I want to thank everyone who has read the article and commented on it. I hope my replies have helped.
Sign in to Reply
bcarso
1/5/2012 2:28 PM EST
Thanks for your clarifications Ken! And yes, I wasn't paying too much attention to the "portable" in the article's title --- my apologies :) I am on edge a bit I guess when I read things about rooms, particularly after some recent pronouncements about how poorly understood loudspeakers and rooms are (AES Heyser lecture by a certain prominent hifi magazine editor).
Floyd is just Toole btw. I joked with him when he was persuaded to leave NRC in Canada and join Harman International, that he had decided to become a capitalist Toole :)
Sign in to Reply
kdboyce
1/5/2012 4:35 PM EST
My apologies to Fred vis a vis O'Toole vs Toole.
No problems Brad. I am glad you read the article thru and your comments gave the opportunity for further clarifications.
Sign in to Reply