SAN DIEGO A new group announced at the DemoMobile 99 conference that it is attempting to establish standards for voice-activated handheld devices.
The charter members of The Voice Technology Initiative for Mobile Enterprise Solutions (VoiceTimes) group IBM Corp., Dictaphone Corp., Intel Corp., Olympus America Inc., Philips Electronics NV, e.Digital Corp. and Norcom Consulting Inc. all currently use IBM's ViaVoice-recognition software in their products.
VoiceTimes' professed goal is to standardize the technical specifications for voice-activated handheld devices, thus permitting interoperability among the devices of different manufacturers targeting the same markets. The standardization effort may hasten the adoption of voice-activated handheld devices by establishing the needs of users and shaping the technical requirements for meeting those needs.
"What is really happening is that voice-activated devices are expected to handle voice, data and text and no one company can provide a complete solution for all three, so we need standards to integrate them," said Tom Houy, enterprise marketing program manager at IBM.
Currently, there are no guidelines for vendors to follow when building voice-activating devices handheld or not. Each OEM must define the method by which voice signals are recorded, compressed and communicated to a speech-recognition engine.
Even if the same speech-recognition engine is used in separate devices, each vendor must still experiment with bit rates, buffering methods, compression algorithms and communication protocols until it finds a workable combination. Once achieved, the uniqueness of the solution virtually guarantees that the device will not work in the same environment with a functionally similar device from another OEM.
No one knows the frustration of those unique solutions as well as e.Digital, an intellectual-property and design house that specializes in creating voice-activated device designs for OEMs. Among the premier customers of e.Digital are Lanier (the dictation division of Harris Corp.), Intel and Lucent Technologies.
"For years now, we have been successfully recording and editing voice signals on flash memory for different customers, but each time we define a new solution we have to cover a lot of the same ground all over again," said Fred Falk, president of e.Digital. "This situation is even worse when compared with our competitors, who don't have the benefit of having previous designs to build on. It's our firm conviction that in order for the market to grow unimpeded, there must be standards set that vendors can use as guidelines."
Each device from the respective vendors in VoiceTimes currently works by recording a speech signal on tape or in semiconductor memory such as flash and then communicating the signal to a PC, which does the actual speech recognition. The steps in each case are the same capturing the signal, compressing it, storing it, then communicating it to the PC.
Despite taking all these same steps and using the same ViaVoice speech-recognition engine, none of those devices can substitute for one another. Interoperability among vendors is virtually nonexistent today, pushing users to lock themselves into a single vendor and forcing vendors to repeating the same steps each time they approach a new design.
"We have been working on a lot of these things for some time, but different companies have been working against each other by solving the same problems in different ways. We hope that VoiceTimes will put an end to all this wasted effort at reinventing the same wheel over and over again," said Steve Rothschild, director of marketing at Dictaphone.
The need for standards is being felt even more strongly today as new speech-enabling technologies, beside simple dictation, are being integrated into handheld devices. For instance, speech-to-text capabilities are being adapted to filling in the blanks on electronic forms in the field, rather than just collecting the raw speech signal and recognizing it en masse later on the PC. Devices are also being adapted to accept a mixture of voice commands to control a device as well as to enter voice data into it.
Other devices are being adapted to access database information about an application (such as a medical patient's records), updating it, and then wirelessly re-entering the updated information back into the database. Some applications even accept text from a remote-running application, then speak the text to a user whose hands are busy, such as "now turn the screw three quarters of a turn to the left."
"To meet the long-term goals of VoiceTimes," said Rothschild, "we intend to hire psychologists and anthropologists to go out in the field and observe how people do their jobs, and hopefully to define just the kind of voice-activated devices that will help people do their job better.
"Initially, with Intel's help, we will do this for workers in the fields of medicine, law enforcement, insurance, service, field sales management and enterprise resource planning," said Rothschild.
In the short term, VoiceTimes has goals that are less ambitious but more pressing, including the definition of standards in such areas as voice-data formatting, DSP selection, compression algorithms, feature sets and the various functional aspects of design considerations.
Once the basic set of standards has been reviewed and accepted by the various members of VoiceTimes, hopefully by 2000, the group pledged to submit the specification to one of the public standardization groups, such as ISO, so that other speech-recognition-engine vendors besides IBM can participate in the effort.