@Max: However, there is a bit of a "gotcha" to all of this, which is the fact that the majority of existing voice processing solutions, such as those employed by smartphone applications, are of a type known as "near-field." Basically, the "near-field" moniker refers to the fact that the user's mouth is "up close and personal" with regard to the microphone on the smartphone. When it comes to controlling things remotely, we need solutions that are capable of far-field voice input processing (FFVIP), which involves a whole new set of challenges.
Depending upon where we are, maybe we don't.
In the home automation scenario described, for example, it would make more sense if there were a number of microphones scattered throughout the residence, so the occupant was always "near field" to one. The user would speak and say something like "Alarm clock, please wake me at 6am tomorrow" The microphone nearest the user would pick it up, and transmit it over the home network to the device handling voice processing, which would route the command to the specified device. There would be no need for the user to be in close proximity to the device being controlled - only in proximity to a microphone that could pick up the command.
I'm not sure what's up with tablets. My three year old Droid X has three microphones, specifically for noise canceling purposes. I'm sure it's not as elaborate as conexant's approach, but it does a pretty good job of sparing my callers from hearing my car stereo or wind noise when they call me. And its quite a bit smaller than a tablet.