BOSTON Over-the-phone electronic-commerce applications should be easier to develop with a new release of the flagship software suite from SpeechWorks International Inc. The company's SpeechWorks 5.0 folds five new turnkey e-commerce modules into a more accurate and leaner speech-recognition engine. True to its neural-network-based core, SpeechWorks 5.0 can also now adapt online to new pronunciations, automatically "guess" at obscure dialects and learn to verify the identify of its users.
"We think our 5.0 release of SpeechWorks meets our customers' needs to quickly design, test and deploy e-commerce applications it's a lot faster and less CPU intensive too," said Bill Ledingham, vice president of product development at SpeechWorks (Boston).
SpeechWorks has always been based on a tool kit of premade DialogModules written in the C language (also available as a set of ActiveX controls). The DialogModules hide the details of neural learning and other specific internal technologies from developers with standardized application programming interfaces (APIs). SpeechWorks 5.0 adds five DialogModules for e-commerce applications.
Two of them automatically capture the number and expiration date of credit cards with simple software calls to their respective APIs. Another new DialogModule permits the time of day to be gleaned from users in the normal ways in which people refer to time, such as "last Monday afternoon at 3" instead of "May 17, 1999, 3 p.m." Social security numbers can also now be captured in conversational mode, as can natural numbers and quantities. For instance, users can now say "send me 52 pounds of chopped liver" instead of saying the digits "five, two" separately and depending on the application to ask if it's pounds or ounces.
For unusual words, SpeechWorks 5.0 now employs licensed technology from E-Speech Corp. that uses an internal knowledge base to automatically generate a whole range of pronunciations of difficult words. For instance, when a new employee is added to an automatic telephone-answering system, it is not always clear how to pronounce his or her name. To make matters worse, people desiring to speak with the new user may not know the correct pronunciation, and thus may mispronounce the name when asking for the person.
Luckily, the new technology from E-Speech enables SpeechWorks 5.0 to automatically generate all the possible ways that people could pronounce and mispronounce a new word, so that when a caller asks for the new employee, SpeechWorks matches the spoken name against a whole database of possible pronunciations.
"The E-Speech technology works with any new words added to the system after it is deployed, not just people's names. It works particularly well for addresses, which often include unusual words that few people know how to pronounce correctly," said Ledingham.
Another addition acquired from another party is speaker verification. Developers can now choose speaker-verification DialogModules from either Lucent Speech Solutions or ITT Industries. With either, SpeechWorks developers can register the actual voice print of a user so that spoken passwords must be supplied by the original user and will not work if uttered by anyone else.
"Our speaker-verification technology adds a level of security that e-commerce developers like banks and brokerage firms have been asking us for and it can run the verification routines in parallel with its normal recognition routines, so that verification is done at the same time a user speaks, say, his account number," said Ledingham.
In addition to these new features, SpeechWorks now automatically tunes itself to new pronunciations after a system has been deployed. This online learning capability reaches all the way down into the bowels of the SpeechWorks acoustic model to improve the accuracy of regional pronunciations of phonemes as it runs.
The overall error rate of SpeechWorks 5.0 was reduced by 37 percent, the company says, increasing accuracy from 96 percent to about 97.1 percent. But the new online tuning ability can enhance accuracy as much as 50 percent more, according to the company, resulting in a possible 98 percent accuracy. The new release also was claimed by the company to utilize about 25 percent less processing power, making it possible for a single PC to handle as many as 96 phone lines, as opposed to 72 in earlier versions.