datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Comment


mememe33T

2/14/2013 5:38 PM EST

I have to say, it's very interesting. Right now, I think MeMeMe Mobile ...

More...



Mapou

8/18/2012 1:09 PM EDT

Sverrir, as impressive as Conexant's SSP research appears to be, I'm afraid that ...

More...

Voice input processing for automotive speech recognition systems

Sverrir Olafsson, Conexant

8/16/2012 3:04 PM EDT

Echo cancelling

Multi-channel acoustic echo cancelling (MAEC)

One of the most controlled sources of noise in a car is the audio being played back from a radio or CD. Most current speech recognition systems require that audio playback be either attenuated or fully squelched for the recognition system to work.

However, using echo cancellation techniques, the audio playback signal can be estimated as it appears at the microphone and subtracted out, leaving only the desired voice signal for the speech recognition engine. This is common practice for speakerphone conversations over Bluetooth, but with audio playback there are typically multiple speakers playing the audio, potentially from multiple independent tracks.

This makes the echo cancelling problem significantly more difficult, as the algorithm must try to estimate the different echo paths from the different speakers for multiple tracks from a limited number of microphones. Some MAEC algorithms require modification of the playback signal to be able to de-correlate the signals to where these echo paths can be resolved. For high-fidelity audio such modifications are not acceptable.

Conexant has developed an MAEC algorithm that does not require any such modifications yet is able to deliver true full-duplex performance. The result is that the speech recognition engine can reliably detect speech even with audio playback at a high level, without the need for a button-push to lower or squelch the playback level first.

High dynamic range analog to digital converter (ADC)
Human hearing has a large dynamic range, allowing people to hear signals over a 100dB span. This in turn means that signals encountered in everyday life span this range, from low-level whispers to rock music testing the upper limits of hearing tolerance. For an audio input system to effectively process its relevant inputs and remove all extraneous noise, the analog to digital converter needs to be able to cleanly convert the signals at all levels without saturating the high-level ones and at the same time avoiding noticeable quantization noise in the ADC itself or noise from the input amplifier.

Many of the DSP algorithms depend on linearity of the input signal, and if it is saturated they will quickly break down. If the speech input itself is saturated, the speech recognition algorithm will perform poorly. To achieve optimal performance, Conexant has developed a microphone pre-amplifier and ADC that can achieve 106 dB dynamic range, enough to cover the range of signal levels required for an automobile environment. For example, when playing loud music from the car radio, this dynamic range allows the MAEC to estimate and linearly subtract the radio signal echo received at the microphones, leaving only a clean representation of the driver’s commands.


Figure 2. Conexant’s CX20805 voice input processor


Conexant’s dedicated voice input processor
Conexant has developed an integrated device, the CX20805, that performs all necessary voice input functions for speech recognition for multiple environments, including the automobile. It includes low-noise microphone preamplifiers and high dynamic range ADCs supporting up to four microphones, combined with a low power yet high performance 800 MIPS DSP to perform the sophisticated algorithms described above. Putting this all together, Conexant is able to offer a solution that significantly improves the quality of voice reception in multiple environments. In particular, it has the potential to enable voice command systems in automobiles to work reliably and dependably to where voice command becomes the preferred method of control by the driver.

About the author
Sverrir Olafsson is VP of engineering at Conexant, managing audio product development. He has managed the development of various technologies for Conexant, including voice-band data modems, VoIP, Wireless LAN and MFP. He was one of the key developers of Rockwell’s and Conexant’s data and fax modems, and an active participant in the V.34, V.90 and V.92 standards developments. Olafsson holds a bachelor’s and master’s degree in electrical engineering from the California Institute of Technology. He holds over 50 patents in communication technologies.

Digital version here (need registration): http://www.nxtbook.com/nxtbooks/cmp/eetimes081312/#/32





Mapou

8/18/2012 1:09 PM EDT

Sverrir, as impressive as Conexant's SSP research appears to be, I'm afraid that this is not going to cut it in the end. The market doesn't want noise-tolerant speech recognition systems that make preprogrammed assumptions about the environment in which they are used. The market wants speech technologies that are as noise-tolerant and versatile as the human brain or better. Personally, I want to be able to talk to my car even when I'm standing next to it. I want to tell it to open, close or lock the doors, any door; or open the trunk. Any company that can deliver this technology will make a killing.
*
It has been obvious for some time that the human brain does not use anything like Bayesian statistics (HMM) to process sounds or anything else. This is the fundamental problem with machine recognition. What is needed is a revolution in our understanding of how humans and animals recognize sounds. Everybody in AI has jumped on the Bayesian bandwagon just as they once jumped on the symbolic bandwagon in the 1950s only to be proven wrong half a century later.
*
In spite of its current success, the hidden Markov model is not it. SR researchers need to start thinking outside the box in my opinion. I say, get off the bandwagon because it's going nowhere. There's a better way, the correct way, and, as we all know, there's a fabulous prize waiting at the end of the road.

Sign in to Reply



mememe33T

2/14/2013 5:38 PM EST

I have to say, it's very interesting. Right now, I think MeMeMe Mobile (www.memememobile.com) is a new entrant into this vast new field, and it's pretty promising. Come check it out for a few minutes, or sign up under the development track.

Peter

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)