BOSTON The Internet's Extensible Markup Language (XML) has been adapted to voice-command technology by SpeechWorks International Inc. The company's SpeechGenie system is designed to integrate speech recognition into Web-based telephony applications.
Based on technology from VoiceGenie Technologies Inc., SpeechGenie demonstrates how voice extensions to XML enable speech-recognition systems to access Web-based information. The product was formed by grafting VoiceGenie's VoiceXML Gateway onto SpeechWorks' speech-recognition engine and its Speechify text-to-speech engine.
The advent of VoiceXML, now endorsed by the necessary World Wide Web standards committees, offers the opportunity for creating open systems like SpeechWorks' proprietary SpeechSite. Part of a fully customizable set of standard speech-enabled e-business solutions, SpeechSite is a prepackaged self-service telephone application that greets callers, routes their calls and responds to spoken requests 24 hours a day for Federal Express, America Online and other brand-name clients.
By integrating VoiceGenie's VoiceXML Gateway with SpeechSite's resources, SpeechGenie extends the speech-enabled business solution into open-systems territory, the company said.
"Our customers were asking for SpeechWorks running on a VoiceXML platform," said Stuart Patterson, chief executive officer of SpeechWorks. "VoiceGenie was the logical partner because it is the first vendor with a VoiceXML Interpreter that is 100 percent compliant to VoiceXML 1.0." Patterson said that industry analysts predict a $12 billion market for voice-enabling services by 2005.
The limited document-representation formats supported by the hypertext markup language (HTML) eventually stimulated the creation of XML last year. XML folds the best aspects of the Standard General Markup Language into a universal package designed to be easily implemented on the Web.
XML documents can be viewed and printed without losing information, enabling database-type structures to be scrolled on screen and enumerated for printing automatically. XML documents explicitly identify data structures with markup-language primitives, so distributed databases can be referenced in a single document, whether the files are local, remote or referenced by implication.
In addition, VoiceXML documents use a standard set of extensions for Web-based telephony applications that separate the applications code from the speech-recognition engine used. Thus, applications written in VoiceXML can reside on any Web server, and yet provide speech recognition and speech-to-text services running on a different platform on a different server called a VoiceXML Gateway.
With VoiceXML, telephony applications can be managed with the same resources that manage Web sites, even if the telephone application is just acting as an answering machine. Of course, VoiceXML can also be used to integrate speech recognition into Web sites, enabling phone callers to access the same Web-based data as computer browsers.
In addition, all the non-Web-based telephony applications, from routing calls to faxing "help" sheets, can be managed by a VoiceXML application residing on a Web server. VoiceXML applications can also serve up data from legacy data systems, using existing back-end databases that are not even integrated into a Web site.
VoiceGenie built its VoiceXML Gateway the server that answers requests from applications and speech-recognition engines by creating an XML schema. XML schemas express shared vocabularies, thereby allowing computers to carry out instructions crafted by programmers within a structured syntax and a semantics customized for their application.
Portability is ensured by explicit platform abstractions within the syntax of VoiceXML, and by separate support of popular audio formats, speech grammar formats and user-interaction schemes all of which can be platform-independent.
VoiceXML incorporates all the control flow mechanisms, such as if-then branches, of a computer language, and allows for the separation of service-side logic from user-interaction behaviors. The computational and database operators are not intended for heavy use, but routinely depend on external resources when heavy computation is needed.
SpeechGenie provides an open-systems solution yet integrates the proprietary SpeechWorks automated speech-recognition and Speechify text-to- speech engines. As an integrated platform, SpeechGenie enables standard speech-driven applications to access Web-based information, conduct online transactions and manage personal communications such as e-mail and voice-activated dialing. SpeechGenie also handles operations administration and management such as user access, platform status and caching.
Scheduled for availability in the second quarter, SpeechGenie comes with Genie Tools, a set of VoiceXML-based DialogModules in the form of prepackaged application building blocks. SpeechWorks provides self-service e-business speech-recognition software including its flagship SpeechWorks, SpeechSite, Speechify and SpeechSecure authentication products. Applications made from these tools can direct customer phone calls, obtain information and complete transactions by speaking over any phone.