Design Article

IMG1

Use microphone arrays for background acoustic noise suppression in portable devices--Part II

Kenneth Boyce, National Semiconductor Corporation

7/11/2008 12:35 AM EDT

Miss Part I? Here it is

Microphone Array Solutions
Microphone array solutions can be an effective technology to suppress stationary and non-stationary noise, depending upon the methods used.

Using appropriate algorithms, individual microphone signals in an array are filtered and combined resulting in beamforming or spatial filtering, which creates complex microphone array polar response patterns having the ability to point to, or away from, particular sound positions. Thus, sounds in certain positions can be isolated and enhanced, or suppressed or rejected. Likewise, correlation of signals in the microphone channels allow determination of direction and location of major sounds.

Depending upon the complexity of the array and its purpose, the array processing can be done with analog circuitry, with a Digital Signal Processor (DSP), a computer software program, or a combination of methods.

Beamforming
There are two beamforming techniques: adaptive and fixed.

For adaptive beamformers, the beam is capable of being steered to various directions using data dependent filtering and variable time responses to the data. Many methods have been developed for building adaptive beamformers. The signal processing is more complex, but permits arbitrary freedom to the array design in terms of number and types of microphones and their spacings. Adaptive beamformers are typically made with digital signal processors or computer software.

For fixed beamformers, the beam is optimized for the direction of the desired sound while suppressing sounds from other directions as much as possible. Typically, closely spaced, differential microphone end-fire arrays, which are inherently directive, are used with or without fixed-time delays to steer the beam. Any filtering and signal processing is also optimized and fixed for the particular mechanical design. Fixed beamformers can be made with analog circuitry, digital signal processors or computer software.

Fixed beamformer solutions are often the preferred solution for speech applications, especially those involving speech recognition. If implemented in analog circuitry, they:

  • respond instantaneously to the noise input
  • are easy to implement and do not require any algorithm software development
  • provide an acceptable level of signal-to-background noise ratio improvement (SNRI) for stationary and non-stationary noise
  • typically exhibit very low to no speech distortion, which also improves overall mean opinion scores in speech quality tests (ITU-T P.835)
  • have inherently low 'computational' complexity and signal latency
  • are lower power consumption than other solutions

By comparison, adaptive beamformer solutions implemented in DSP or software:

  • require time to repetitively recognize and converge on the noise signal while applying and adjusting the suppression algorithm
  • provide an overall higher SNRI, but often at the expense of speech output signal artifacts such as delays due to noise signal convergence time, pops and clicks, un-intended muting, frequency distortions, echoes, or aperiodic changes in signal levels generally associated with the sub-band frequency signal processing methods used
  • are more difficult to implement, requiring algorithm software development
  • require higher power consumption
All beamformer solutions using very small arrays are highly sensitive to errors due to microphone gain and phase mis-match over frequency, as well as differences in the acoustic paths which could arise if they are embedded in a product instead of being in open air. Therefore, beamformer solutions must have some means of compensating for these types of errors whether it be within the beamformer implementation itself, or requiring matched microphones and acoustical paths external to the beamformer implementation.

Microphone Spacing
The Nyquist rate of spatial sampling is 1/2 the wavelength (d= λ/2) of the highest frequency of interest. To spatially sample one wavelength of the frequency of interest, two sensors are required 1/2 wavelength apart.

An analogy to oversampling would be where d < 1/2 wavelength (d <λ/2), which allows the wavelength to be sampled more than two times.

Spatial undersampling would be where d > 1/2 wavelength (d > λ/2), which would allow one wavelength of the frequency of interest to complete and restart before the second sensor can sample the signal. Spatial undersampling can result in aliasing higher frequency signals down into the frequency band of interest, giving confusing results. To avoid aliasing the sampler is bandwidth limited above the maximum frequency of interest.

Many studies have shown that very effective microphone arrays can be built with sensor spacings much smaller that the minimum needed for the Nyquist rate. Consider the case where sensors are spaced at 1/8 the wavelength of interest.

In a speech only system, the frequency range is 300Hz to 3500Hz, with the greatest amount of vocal energy being between 500Hz to 2500Hz. The λ/8 spacing would be 1.18cm for 3500Hz and 1.65cm for 2500Hz.

Frequencies below 3500Hz and 2500Hz would still be oversampled because the wavelengths get longer, and the 1.18cm or 1.65cm spacings effectively sample more of the signal.

An alternative calculation could be to set the spacing at 2cm and determine the wavelength spacing (λ)/(c/df) at 2500Hz to be: λ/(331.1/0.02*2500) = λ/6.62

If the spatial sampling rate remains less than λ/2 for the highest frequency of interest, the microphone spacings can be adjusted to fit the product application.But, as the spacing d gets closer (spatial sampling rate gets faster), the far-field signals in the microphones are more highly correlated and the array has better overall background noise suppression over a wider range of frequencies. As the spacing gets further apart, the array has less overall suppression and becomes restricted to lower frequency responses.

Once the sensor spacing is fixed, the array is now optimized for the frequency of interest. In the case of a fixed beamformer, the array response pattern has also been fixed.

For any particular product, a design tradeoff is made between frequency range of operation vs. desired noise suppression levels at those frequencies, theoretical vs. practical microphone spacings, and overall array system cost and complexity.

1  2 

print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Product Parts Search

Enter part number or keyword
PartsSearch

FeedbackForm