Since first appearing in automobiles in the late 1990s, voice recognition user interfaces have grown in the breadth of their applications and in the quality of their performance. Realization of this utility fortunately comes at a time when drivers and passengers are bringing more and more personal electronic and connectivity devices into the car—with their potential for greater driver distraction.
In addition, burgeoning growth in vehicle systems and functions presents increased potential for diverting driver concentration if feature interaction with the operator is not well planned.
But developers are literally unanimous that easy-to-use, intuitive speech recognition greatly reduces possible distractions by allowing drivers to keep eyes (and attention) on the road while keeping their hands on the steering wheel.
The automobile environment, however, is not very conducive to doing voice recognition, notes Brian Radloff, director of world wide embedded solution architecture of the Mobile Speech Division of Nuance Communications, a leading speech-recognition software provider. The car is inherently noisy acoustically and having to use far-field microphones (at least in most applications) compounds the task.
He notes when voice systems first appeared, the "recognition rate" was not always acceptable (see page 3). But by the early 2000s, improvements in microphones, audio technology, and echo and noise cancellation/subtraction algorithms moved recognition "up where it needed to be." Other improvements included automotive noise models built into the voice recognition module, as well as modeling (based on actual voice data from "users") basic "sound" units of speech (or phonemes) in an auto noise environment. "These [techniques] still form the core of our automotive offering today," Radloff adds.
As more processing capability became available (see page 2), more complex domains (features) were enabled, he says. For example, one improvement was being able to "dial" a phone by saying the "name" of the target (i.e. "Call home" or "Call Jack"), as well as dialing by speaking the phone number's digits. Likewise, navigation functionality in voice went from just zooming in and out on a map to inputting addresses by number, then street, then town step-by-step; to where users can just say the address naturally in one phrase.
head unit display of a Ford SYNC voice texting screen visually depicts a few
Today's speech recognition systems "can build vocabulary dynamically from the phone book in a user's phone brought into the car," Radloff notes. In other words, the system "reads" the phone book listings automatically and adds those to its "grammar" (vocabulary) in order to recognize them when the user speaks to dial a listed phone by name. Similarly, music players can be read to harvest song titles for later request by voice.
The latest MyFord TouchTM
connectivity system with Nuance software uses both voice and touchscreen interfaces, incorporating many of the functionalities noted above. "And we are expanding the vocabulary with different words, including synonyms [to be even more intuitive]," says Radloff. An example would be responding to the words "I'm hungry" with composing a list of nearby restaurants. "The system combines processing power available, for [running] more robust algorithms, and memory," he adds. And there's the rub.