Why buttons for so long at Stage 1?
There are several reasons speech recognition has required buttons to activate rather than voice activation. The main reason has been that buttons, although distracting and expensive, are quite reliable and responsive, even in noisy environments. The car is a challenging environment for speech recognizers and a voice activation word must respond with windows down, radios on, and without the aid of having the microphone a few inches from the speakers’ mouth. Traditional speech technologies are reliable at responding in quiet environments but not the high noise of a car.
The requirement of a speedy response time further complicates this challenge. Speech recognizers often require hundreds of milliseconds just to determine the user is done talking before starting to process the speech. This time delay might be acceptable from a recognition system to yield an answer or reply to the consumer. However, at Stage 1, the response of the activation is calling up another more sophisticated recognizer at Stage 2, and consumers will not accept a delay lasting much more than the time it takes to press a button. The longer the delay, the more likely a recognition failure occurs at Stage 2 because users might start talking before the Stage 2 recognizer is ready to listen.
What’s new today?
Recent advances by Sensory
with the TrulyHandsfree
Voice Control technology make buttons unnecessary. The TrulyHandsfree solution is robust to noise, small in footprint and power consumption, accurate in responding when spoken to, and fast in response time. Since TrulyHandsfree’s recent introduction it is already appearing on the market in devices from toys to telephones.
Stage 1 implementation
Several major hurdles were overcome in creating the Stage 1 recognition system. A new probabilistic-based
model allows the recognition to work effectively even with the radio on, windows down, and motor noise. The nature of the probabilistic model allows the recognizer to determine a spoken trigger phrase is being said in real time, because it doesn’t require 100% confidence to accept the phrase, so it can actually respond with 90% of the phrase spoken. This approach helps to improve performance in noise and with stronger accents.
Typically, the most difficult part of implementing a voice user interface is in the system specification and dialog design. For existing speech based systems with button based activation, it is a simple procedure to advance to a voice activated mode by adding a Stage 1 recognizer. The simplicity of this “incremental” change is accomplished because the user experience is enhanced without the need to redefine it. Existing dialog designs can be used with the enhancement of replacing the button press. Usually this transition can be done with a small amount of added software taking only a few hundred kilobytes of memory, and around 40 MIPS of processing power.
Inexpensive dedicated speech chips (such as Sensory’s NLP-5X
) are also available making the addition of voice activation or simple command and control functions simple, fast, flexible, and relatively inexpensive.
What the future holds
In car speech systems will see substantial improvements over the coming years. The automotive voice user interface, always desirable in the car but only recently usable, will be optimized for ease of use and become quite essential for convenience, productivity and safety. The front end voice activation is a great new step in this direction and advancements in noise-robust technology and deciphering “meaning” will continue to fuel the utility of speech technologies in car.
Todd Mozer is CEO of Sensory. He can be reached at email@example.com.
If you liked this article, go to the Automotive Designline home page
for the latest in automotive electronics design, technology, trends, products, and news. Also, get a weekly highlights update delivered directly to your inbox by signing up for our weekly automotive electronics newsletter here