NEW YORK Long considered a niche technology years from widespread adoption, speech recognition is becoming part of the pervasive computing strategies of IBM, Cisco, Intel and Microsoft, which intend to embed it into every future cell phone, PDA, car and consumer gadget.
The technology still has a way to go, but that hasn't stopped plans to use the Web as a medium for natural speech between man and machine. "The technology is not there yet for dictation to computers," said Xuedong Huang, general manager of the .Net Technologies Group at Microsoft Corp. "Based on historical extrapolation we will have it in about 10 years."
Microsoft launched a speech initiative based on its .Net platform at the SpeechTek 2002 conference here this week and announced the availability of a beta release of the .Net Speech Software Development Kit, which includes tools for creating applications using Speech Application Language Tags (Salt). The kit also supports the development of speech applications that can summon both text and graphics content on desktop PCs and Microsoft's Tablet PCs running the company's Internet Explorer browser software.
"Microsoft's Salt-based technologies are designed to easily extend the Web server infrastructure to enable Web access via speech on a wide range of devices," said Kai-Fu Lee, corporate vice president of the Natural Interactive Services division at Microsoft. "We believe that Salt will enable a multitude of speech Web applications and have submitted the standard to the World Wide Web Consortium [W3C] for adoption."
More Salt, please
Microsoft and Intel said earlier this month at the Intel Communications Summit in Orlando, Fla. that they would develop enabling technologies and a reference design for Web-enabled speech applications using the Salt specification.
The W3C has defined VoiceXML 2.0 and will look to incorporate Salt specifications into the next version of the standard, said Jim Larson of Intel, chairman of the W3C Voice Browser Working Group.
Under an alliance formed earlier this year with SpeechWorks International Inc., a text-to-speech technology leader, Microsoft plans to bring Salt technologies to developers of enterprise applications. "Speech has fared well in the mother of all recessions," said Stuart Patterson, president and chief executive officer of SpeechWorks, during a panel discussion at SpeechTek. "Speech technologies are moving from the network into the embedded arena very fast, and standards like Salt, VoiceXML and Aurora for mobile applications will all contribute to this trend."
SpeechWorks is a founding member of the Salt Forum, which intends to develop Salt as a royalty-free, platform-independent standard. Other founding members are Cisco Systems, Comverse, Intel, Microsoft and Philips Speech Processing. (Philips Electronics recently signed an agreement to sell its Speech Processing unit to ScanSoft Inc. of Peabody, Mass.) The forum has submitted the Salt 1.0 specification to the W3C.
Ultimately these companies hope to replace dial tones and keyboards as the means for gaining access to information on the Web. Today's handhelds like those from Palm Inc. utilize pens, keyboards and a limited voice-recording capability. The Tungsten T handhelds Palm introduced this week, based on Texas Instruments Inc.'s Omap1510 processor and featuring a 320 x 320-pixel display, is even smaller than the company's earlier products. The increased number of features on these systems cry out for voice-activated commands to alleviate the limitations of ever-smaller form factors.
Calling the Web
Cell phones without keypads are expected to hit the market as early as next year. Philips Electronics recently demonstrated a palm-sized flat display. Such devices are expected to enhance the use and appeal of the mobile Internet when coupled with voice-recognition features that allow users to call up any Web page from a mobile device just by speaking its address.
In a demo at the 3GSM World Congress in February, Mitsubishi Electric showed its Trium Mondo PDA with a SchlumbergerSema SIM card featuring voice validation software developed by Domain Dynamics Ltd. The software provided a biometric template that can recognize a person's voice to properly identify a user a necessity when providing access to corporate or private databases over the Internet.
Tim Phipps, chief technology officer at Domain Dynamics, said voice authentication can ensure security for a new generation of wireless devices with powerful processors, plenty of multimedia functions, and easy links to other systems. "The road to secure m-commerce [mobile commerce via wireless devices] is to first implement voice authentication on terminal/PDA designs as a simple feature to unlock access to the handset," said Phipps. That would eliminate the common, inconvenient PIN numbers now in current use.
William Osborne, vice president of the Pervasive Computing Segments of IBM Corp., agreed that voice will become an authenticating medium as PDAs get smaller and smarter, but added that dictation will also become pervasive. A speech technology veteran with more than 20 years of experience in the field, Osborne is an advocate of the technology. "We have to start thinking of speech in terms of an end-to-end imperative in the IT world," he said. "Speech must become part of the IT industry, not remain an industry of its own kind."
Ron Croen, chief executive officer at Nuance, assessed how far speech technologies must go before they become an everyday interface for computers at work and at home. "A lot of investment money is coming into the industry, but too few people are making a profit and too many are chasing the same goals," he said, referring to a proliferation of proprietary solutions. "We need to standardize on the standards, consolidate, and have each industry player find the natural level of expertise to contribute to the whole," he said.