MENLO PARK, Calif. With more than a million voice-activated transactions per day already under its belt, Nuance Communications is expanding its natural-language speech-recognition server capabilities to an even broader audience.
At the low end, the company has unveiled a $250 development package that can be incrementally expanded as application complexity grows. At the high end, Nuance has upped the clout of its core speech recognizer to permit dynamically changeable grammars customized to each individual speaker, plus the ability to correct single utterances within long phrases rather than having the user repeat the whole phrase.
Nuance Communications pioneered the use of speech-recognition technology in commercial settings where multiple simultaneous voice-activated applications share access to a common voice server running the Nuance core recognition routines. A typical Nuance-based voice-activated application responds appropriately to complex utterances, such as "transfer $350 from my savings to my checking account and pay my Visa bill."
Originally, the Nuance core recognition routines were available only in the full-blown version, which permitted large companies to create their own applications capable of handling hundreds of simultaneous natural-language transactions. For example, Sears has replaced more than 3,000 human operators who previously routed calls to appropriate departments with the Nuance routines. You can now call Sears and say "I'd like to speak with someone about a toaster" and be automatically routed to the small-appliance department.
With the announcement of its low-cost Nuance Express development environment, smaller customers will be able to afford voice recognition, the company said. Nuance Express makes it possible to incrementally expand simple applications that just recognize a few words, so smaller companies need buy only the capabilities they require and can upgrade as needed.
"Nuance Express offers a seamless yet cost-effective migration path from the entry-level Express package all the way up to our scalable enterprise-level recognition engine," said product marketing manager John Shea.
Nuance Express is actually the same package as the full-blown Nuance version 6.2, except that its capabilities have been secured with a set of codes that developers can buy separately to unlock its more sophisticated capabilities. The base version of Express only recognizes "yes and no" phrases and single digits. Any popular manner of verbalizing yes and no, such as "yep" and "nope," are included, as are the various manners of verbalizing the digits 0 to 9.
Higher levels of recognition can be unlocked to embrace just the capabilities needed for a given application. For instance, the alpha-character package enables recognition of all the alphanumeric characters, and the 20-custom-items package permits developers to add any 20 discrete words, phrases or complete natural-language sentences.
The full-blown Nuance version 6.2 core not only recognizes ad hoc natural-language sentences, but can also verify the identity of the speaker without the need for passwords. The user merely begins speaking and the Nuance core matches the voice pattern with that previously stored for the speaker, even if the speaker is saying words he or she has never uttered before.
"Nuance doesn't need to hear the same words spoken as before in order to recognize a voice," said Shea. "For very secure applications that permits it to ask for the user to speak an entirely new word so that no one could have made a covert recording and be playing it back."
Nuance 6.0 also fully supports Java and ActiveX, so that programmers with no previous speech-recognition experience can nevertheless develop reliable applications. Application programming interfaces for Java and ActiveX simplify development by virtue of a set of prebuilt speech objects, which perform all of the most common speech-recognition tasks with no programming.
Two entirely new capabilities built into version 6.2 permit single phrases to be corrected and grammars to be personalized. Previously, if a part of a phrase was not recognized, it had to be spoken over again in its entirety. But Nuance 6.2 can stop in the middle of a parsing task and correct only the part that was misunderstood. For instance, if the user says, "Please make me a reservation for the opera Mon . . . no, Tuesday night," the Nuance core will respond, "What night would you like to go to the opera?" rather than ask the user to repeat the whole sentence.
The second new capability, personalized grammars, permits users to create their own shorthand way of referring to things.
For instance, instead of referring to two Visa accounts with their numbers, the user can call them as "my corporate Visa" and "my personal Visa." Likewise, the user can invent shorthand nicknames for voice-activated dialing, customized bill paying, stock and bonds lists, and so forth.
"We think our personalized grammars will go a long way to making voice-based applications as friendly to use as speaking with another human," said Shea. "We have also improved the speed of the Nuance algorithms by 20 percent as well as gained a 40 percent throughput improvement." The upshot, he said, is to "reduce the overall cost of a system's hardware."
Nuance 6.2 and Express run under Windows NT, Sun Solaris, IBM AIX, SCO Unix and Digital Unix.