United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 


Neural nets create natural speech interaction








EE Times


PORTLAND, Ore. — SpeechWorks International is applying neural-network technology in a practical-speech-recognition system that achieves a new level of natural-speech interaction by letting telephone callers ask for information in ordinary, unstructured English. Federal Express Inc. is delploying the approach in a shipping-rate information system in cooperation with NextLink Interactive.

"We have designed SpeechWorks from the ground up to make it easy for developers like NextLink to create conversational interfaces to automated systems like FedEx's rate information. Our DialogModules encapsulate the intricacies of speech recognition into easy-to-use building blocks with which engineers can quickly build applications," said Mark Holthouse, vice president of technology at SpeechWorks.

Instead of forcing engineers to develop speech applications using a dedicated tool, the company provides a tool kit of premade DialogModules, with applications programming interfaces (APIs) to either a C-language library or a set of ActiveX controls. The intricacies of neural learning and other speech-recognition techniques remain hidden to programmers using the DialogModules, letting application developers integrate the building blocks quickly into their own environments.

Digestible elements

At the lowest level, DialogModules gather the raw speech signal into easily digestible processing segments. Most other speech systems arbitrarily divide raw speech signals into 20-ms segments and then use smart software to glue the fixed-sized segments into variable-sized pieces that correspond to spoken phonemes. In contrast, SpeechWorks DialogModules use neural learning to guess how phonemes naturally divide the speech stream into segments right from the start.

That results in "as much as five times less data to work with," said Holthouse.

Segmentation is learned by a neural network that assigns probabilities to its first-pass guesses as to how the continuous-speech signal is divided into separate phonemes. After smart segmentation, a hidden Markov model tests the segments against all the known phonemes, using standard Gaussian statistical distributions to identify the phonemes and assemble them into words. A probability lattice is generated, and a traditional grammar search outputs the segmented spoken words to the application.

The recognizer "compares the various possible paths through the probability lattice with the phonetic representation of the words in its vocabulary, providing a probability score for each," said Holthouse.

A range of general rules governing language models can be used to constrain the number of possible recognized words by determining which words are more likely to follow one another. Application developers can also introduce their own constraints and obtain a ranked list of the most likely word strings just spoken.

Prepackaged DialogModules can then identify specific utterances corresponding to queries made by the natural-language processing in the application.

Assume, for instance, that a "yes/no" answer is expected to an application query to the user. There are about 30 different ways that people can say "yes" (s "yep," "yeah," "correct" and the like) and about 20 different ways they can say "no." Application programmers can check all of the variations simultaneously by merely making a call to the "Yes/No" DialogModule.

Other premade DialogModules are available for quickly identifying telephone numbers, zip codes, dates, currency and item lists specific to an application. Engineers can also add their own DialogModules to capitalize on the natural-language sequences that make sense for specific applications. For instance, a United airlines reservation system will heavily weigh "Boston," the name of one of its hubs. At this level, even whole phrases can be recognized.

Tuning response time

Once the vocabulary and grammar have been perfected, SpeechWorks monitors behavior in real-time to fine-tune the response time of a deployed system. The tuning tools log every activity as it occurs, including the actual caller's speech. System administrators can use the "on-the-job" application logging to pinpoint problem areas so that the app improves its response over time. They can go directly to the calls that are most successful and compare them with those that are least successful.











  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
With Acquisition Delayed, Sun Cutting 3,000 Jobs
With its proposed acquisition by Oracle being delayed by regulators, Sun plans to cut 3,000 jobs across several regions over the next 12 months.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About