News & Analysis
The elusive recognition of voices
Nicolas Mokhoff
1/3/2013 11:06 AM EST
Voice recognition is one of those technologies that are wrought with peaks of anticipation followed by valleys of disappointments. The quest for finding a practical human voice recognizer has been an ongoing challenge ever since the first attempts at human-machine interface.
The latest consumer oriented human-machine interface is Siri, an intelligent personal assistant and knowledge navigator application for Apple's iOS. Dag Kittlaus, the co-founder and former CEO of Siri sold the technology to Apple. Apple claims that the software adapts to the user's individual preferences over time and personalizes results, and performs rudimentary tasks such as recommending nearby restaurants, or getting directions.

Click on image to enlarge.
Siri co-founder Dag Kittlaus, developer of a “do” voice recognition engine,
which allows one to do things with information.
Photo credit: Northwestern University
which allows one to do things with information.
Photo credit: Northwestern University
Siri has its genesis at Darpa the defense research agency that funded the project at SRI International, “a $250 million project,” according to Kittlaus. The scope of practical voice recognition can be seen from who else was involved in getting Siri off the ground. The research was conducted using groups from Carnegie Mellon University, the University of Massachusetts Amherst, the University of Rochester, the Institute for Human and Machine Cognition, Oregon State University, the University of Southern California, and Stanford University.
Apple launched the Siri as one of its apps and as an integral part of the iPhone 4S. Siri allows a user to voice and send messages, schedule meetings, place phone calls, make recommendations, answer questions, and can understand natural speech. Siri also asks questions if it needs more information to complete the tasks.
Given that the tasks are well-defined and limited the voice recognition works as designed.
And that is the crux of the history of voice recognition. The application needs to be limited in scope and focused on particular tasks, and kept in a closed ecosystem for individuals to receive a small set of satisfactory results on their iPhone 4S.
Speech synthesis and voice recognition applications are pervasive today ranging from telephone applications for checking airline schedules, to voice-assisted navigation systems in automobiles, computers for the blind, and security applications.
In yearly choices of "5 in 5 predictions" IBM Fellow and Speech CTO David Nahamoo predicted biometrics to come into its own with voice recognition part and parcel of the biometric technologies.
“Over the next five years, your unique biological identity and biometric data – facial definitions, iris scans, voice files, even your DNA – will become the key to safeguarding your personal identity and information and replace the current user ID and password system,” said Nahamoo from IBM Research. “We can take advantage of the advanced technology being used in the smart devices, such as microphones, touch screens and high definition cameras to fully employ biometric security options.”
In the most recent year's predictions "5 in 5 predictions" IBM researchers listed emerging technologies that will continue to push the boundaries of human limitations to enhance and augment our senses. Among these are machine learning, artificial intelligence, and advanced speech recognition.
So while the goal of continuous voice recognition remains as elusive today as it was 50 years ago, there are pocket applications defined by limited domains that make practical voice recognition possible.
Navigate to related information

