PARK RIDGE, Ill. The in-car PC boom that was supposed to be in full swing by now hasn't happened, and it may be delayed another 12 to 18 months as automakers and vendors run up against hurdles in implementing speech recognition systems.
The holdup is a disappointment for manufacturers that have invested millions in the development and promotion of in-car PCs. But because speech recognition systems are critical to addressing potential driver distraction issues, carmakers want them to be as close to perfect as possible. And they're not there yet, say industry observers.
Voice-recognition systems, such as Clarity's, are key to enabling next-gen auto electronics.
"When we sign nondisclosure agreements and talk to automotive vendors, they all acknowledge that there are problems with speech recognition in vehicles," said Fred Nussbaum, vice president of business development for Clarity LLC (Troy, Mich.), a maker of software-based speech-capture systems. "They're not going to talk about it publicly, because there are a lot of legal ramifications, but the problems are there."
Software makers and industry analysts say those problems are a key reason why Cadillac has delayed the introduction of its Infotainment system from fall of this year until late 2001, and why Visteon Corp. (Dearborn, Mich.) has yet to put its ICES (information, communication, safety and security) technology into a production vehicle.
Industry analysts also blame the lack of a good speech system for the dismal performance of Clarion's AutoPC. Clarion had expected to be selling the system at a rate of thousands per month by 2000 but has sold just 3,500 units in two-and-a-half years.
"Speech recognition is definitely a hurdle," said Thilo Koslowski, a senior analyst for the Gartner Group's e-business automotive service. "Manufacturers have to be very careful about deploying systems that are not 100 percent reliable because they could face lawsuits from consumers."
Eyes on the road
The race to create more effective speech systems is seen as critical for automakers. Several of them, most notably Ford and General Motors, have espoused an "eyes-on-the-road, hands-on-the-wheel" philosophy as they work to incorporate new electronic capabilities into automobiles. That philosophy is seen as especially important now, in light of the recent passage of state laws restricting drivers' use of cell phones while under way.
But automakers say they can't bring eyes-on-the-road, hands-on-the-wheel techniques to vehicles unless they have good speech recognition systems. That's why General Motors has forged partnerships with General Magic Inc. (Sunnyvale, Calif.) and Nuance Communications (Menlo Park, Calif.) to work on voice recognition systems. It's also why Ford Motor Co. has allied itself with speech recognition developer Lernout & Hauspie (Leper, Belgium), which filed for Chapter 11 protection this past week following management missteps.
'Technical issues,' GM says, have delayed the rollout of Cadillac's Infotainment system for at least a year.
Automakers say they plan to continue to work on speech recognition systems, but they deny that there are problems. "The technology is where we expected it to be," said Ed Chrumka, advanced technology manager for OnStar.
Indeed, OnStar representatives point out that the company's Virtual Advisor, an off-board speech-based service that provides e-mail, news and stock quotes, is coming out as scheduled at the end of this year. Delivery of the system has already begun in the Northeast, and industry analysts said they are impressed by it. "In the testing that we've done, it performed at a very high level," said Dawn McGreevey, an automotive analyst for Gomez (Lincoln, Mass.), an Internet system quality-measurement firm.
But in-car PCs, which use on-board electronics, have not fared as well. Cadillac's much-ballyhooed Infotainment system, which was now supposed to be available on the Cadillac Deville, is at least a year behind schedule. A General Motors spokesman declined to comment on reasons for the delay, except to say that there are "technical issues."
Similarly, a Visteon spokeswoman said that its ICES system is in development programs with several OEMs but would not say when it will reach production. At the 1998 Society of Automotive Engineers (SAE) conference in Detroit, however, Visteon and Ford predicted that the first units would be in vehicles by 2000.
Automotive engineers and software makers do acknowledge that equipping car systems for speech recognition has proved a more formidable task than had been expected.
"There's an overriding perception that speech recognition has no moving parts and is easier to implement than it really is," said Ron Risdon, vice president of business development for Conversational Computing Co., which makes the Conversay speech technology product. "Over-optimism is very prevalent."
Wall of sound
The crux of the problem is that vehicles, unlike desktop PCs, are subjected to a wide variety of noises that can confuse software-based speech recognizers. Compounding the problem is that in-car speech recognition is often done by remote servers over cellular links. "Working with speech recognition over a cellular link is like doing magic," said a senior engineer who works for a major automaker. "You have to worry about more than just the noise generated by the vehicle. There are about 20 different sources of noise." The road, wind, defroster, fan, radio, windshield wipers and backseat occupants are just a few.
If speech recognition is done over a cellular link, the system also must deal with such issues as line echo, electrical interference and poor signal strength.
Automotive engineers say that the problems aren't insurmountable. "It's not a matter of whether the technology is mature," said Chrumka of OnStar. "It's more an issue of the application of the technology in variable environments."
Software makers say that the problems are magnified at higher vehicle speeds. Most voice recognition systems currently claim accuracies of 90 to 95 percent, but some say that such claims are averages, which hold true at 30 mph but not at higher speeds. At 70 mph, for example, some engineers say that the accuracy figure dips to about 70 percent. If occupants crack open a window, turn on the radio or blast the air conditioner, the accuracy figures drop even more.
"Even if you have 90 percent accuracy, one out of every 10 phone digits that you dictate are going to be wrong," said Jim Wargnier, vice president of engineering at Clarity and a former engineer for OnStar and for Delphi Automotive. "At 70 percent it's going to be extremely frustrating for customers, even if they have a great user interface."
Some engineers disagree with the 70 percent accuracy figure, even for high-speed applications. It's greatly exaggerated, they say, and automotive engineers have found ways to deal with high speeds. "As cars go faster, wind noise rises, but any good speech recognizer changes itself to accommodate that," said Scott Pyles, director of product management for Lernout & Hauspie's automotive products.
Some engineers also say unexpected noises are of greater concern than high speed. "The issue isn't steady-state noise," said Chrumka. "The big things that affect voice recognition are the variables kids in the back seat, windows opening and closing, pops and cracks in the cabin."
Automakers are concerned about even the subtlest lack of accuracy because it could place greater "cognitive load" on the driver, who theoretically should be free to concentrate on traffic and driving conditions. "It should only take you so long to dial a cell phone, tune the radio or turn on the air conditioner," Wargnier said. "There are a lot of legal ramifications for those companies if there are problems or if they place too much cognitive load on the driver."
Stories of drivers' struggling with voice recognition systems are already commonplace, even though the technology has been available for only a short time. Such stories are a concern among industry analysts as well as automakers.
"If you want to change the radio station but you have to repeat the command 10 times in order to make it happen, that's a big problem," said Gartner's Koslowski. "Even though the system is voice-controlled, you still end up concentrating too much on changing the radio station, and that affects your driving."
Some believe the dilemmas facing automotive speech recognition may be a result of hardware rather than software. "It may be a particular problem having to do with the processing power inside the car, as opposed to the speech technology," said Bill Meisel, president of TMA Associates, a speech industry marketing and consulting firm. If that's the case, Meisel said, the problem would be more focused on in-car PCs, such as the Infotainment system or ICES.
"Server-based systems processing voice over wireless connections would be less prone to problems, because they can have as much memory and as much speed as they need," Meisel said. Such systems as OnStar's Virtual Advisor use off-board, server-based processing.
Loud and clear
Kurt Sievers, automotive marketing manager for Philips Semiconductors, said his company has had success running voice recognition on its Hello IC in moving vehicles. "The usage scenario is difficult, but it has been done," he said.
The company has demonstrated recognition over a 300-word vocabulary for command and control apps that can, for example, use voice to turn the volume on a radio up or down. "We've had it on a test track at 120 kilometers an hour with the window open, and it still recognizes the driver's voice," Sievers said.
There are strategies to deal with voice recognition in a car with multiple passengers, said said Corado Giorgetti, director of business development at ALST, an Israeli joint venture between Altec-Lansing and STMicroelectronics that was created to develop speech DSP technology. For example, audio systems can be set up to "listen" preferentially to the person behind the steering wheel and treat other voices as "noise" to be canceled, he said. But it is not yet clear how successful such strategies are.
Separately, ST has developed a specialized 24-bit DSP-based chip called Euterpe that can perform the functions of voice recognition, text-to-speech rendition, noise and echo cancellation, and biometric verification on audio data streams. At present, the ST system is good for command and control, according to Paolo Gonella-Pacchiotti, car multimedia business unit director at the company. "We are moving toward continuous speech recognition," he said.
Some software makers, such as Clarity and Conversay, believe the solution lies in the use of specialized software and better microphones. Clarity, for example, offers a technology known as Clear Voice Capture, which extracts the voice signal of interest. The company says the technology provides an improvement over noise suppression systems, which have difficulty with signals that have components overlapping with voice signals.
Similarly, Conversay offers filtration techniques that separate speech signals from noise signals and narrowly focus on the speaker. The system employs two microphones one on the passenger side and another on the driver side and is focused more on distributed speech, for which processing power is split between the client and server.
Engineers are also reportedly looking at microphone technology as a way to boost accuracy. But the best, the so-called "array" microphones, cost between $100 and $180, and that's beyond the acceptable limit for automotive applications.
Many in the industry are unconvinced by automakers' claims. "The reality is that today's systems are still failing in a lot of different modes," said Nussbaum of Clarity. "But the technology will get better before it reaches the market. Right now, we just don't know when that will be."
Peter Clarke contributed to this report.