PRINCETON, N.J. In a bid to bring practical voice control to consumer products such as home appliances and cell phones, Sarnoff Corp. is lending its speech-enhancement algorithm expertise to Sensory Inc., a provider of speech-recognition technology.
A joint project between the two companies will tune Sarnoff's algorithms to Sensory's speech ICs and software products to enhance recognition in noisy environments. Sensory plans to introduce the first speech-recognition ICs utilizing Sarnoff's Voice Thru algorithms in the third quarter.
Sarnoff's work with Sensory brings the company's vision of speech interfaces one step closer to fruition, said William Porter, general manager of Voice Thru products at Sarnoff (Princeton, N.J.).
"We believe speech interfaces are the preferred choice for consumer devices of the future and what's holding it up is getting a clear signal into the [speech] recognizer," said Porter. The company's strategy is to enhance and enable speech, thereby eliminating the microphone and headsets currently required by speech-technology products.
Sarnoff's Voice Thru is a family of speech-enhancement algorithms designed for single-microphone, double-microphone and multiple-microphone applications. The algorithms clean up the noise in audio input signals to yield a clean voice signal. When a voice signal is clean, the accuracy of the speech-recognition engine increases.
Sarnoff's approach uses signal-processing techniques, rather than the hardware-oriented noise cancellation employed by many microphone manufacturers. The Voice Thru algorithms have their roots in work begun in a 1993 program funded by the Defense Advanced Research Projects Agency, initiated to develop signal-processing technology to enhance microphone arrays.
According to Paul Sajda, Sarnoff's chief technology officer, there are different Voice Thru algorithms for different applications. A spectral-based algorithm for single-mike applications uses the speech spectrum and tracks noise in that spectrum to produce a clean signal. In dual-microphone environments, an adaptive noise-cancellation algorithm is used. To sharpen the signal, samples are taken from both noise sources and estimates are made of the noise source between the two.
Environments with several speakers would use a statistical-beam-forming also called Blind Source Separation algorithm. That technique assumes the original speakers are statistically independent and separates voices using statistical constraints.
The company's algorithms scale from less than 40 bytes of instructions to several megabytes of code, depending on how much space is available, the recognition engine and the processor used, according to Sajda.
While the company is offering commercial products, it is proceeding with work in the lab that focuses on how to combine the algorithms for system-level applications. Sajda explained that controlling a television by voice might require the blending of standard microphone-array and spectral-based algorithms.
Sensory (Sunnyvale, Calif.) plans to embed the smaller spectral-based algorithms into its 8-bit microcontrollers, custom ICs and commercial DSPs for low-cost, low-power applications.
The first chips to use Voice Thru will be the RSC-264T 4-bit and RSC-364 8-bit microcontrollers. The RSC-264T works in harmony with Sensory Speech 4.0 Technology and offers numerous on-chip features; it will enable sales of interactive speech products for as low as $15, Sensory said. In high volumes, the chip is priced from under $2.25 to $5. The RSC-364 provides a 4-Mips microcontroller, audio preamplifier, A/D and D/A converters, watchdog timer, 64 kbytes of ROM and 2.5 kbytes of RAM. The company expects to release DSP and 16-bit software-based solutions before the third quarter.