processor vendors are beginning to focus on embedded vision applications and to tune their processors for such apps, often by incorporating specialized co-processors designed for vision processing.
Do embedded processors shape applications, or is it the other way around?
In reality, it works both ways. This is particularly evident in digital-signal-processing-intensive applications, such as wireless communications and video compression. These applications became feasible on a large scale only after the emergence of processors with adequate performance and sufficiently low prices and power consumption. And once those processors emerged, these applications started to take off. Then, the growing market attracted competition and investment. Processor vendors tuned their processors for these applications.
As a result, we got a generation of DSP processors with features such as add-compare-selection instructions for Viterbi decoding, and a generation of DSPs and CPUs with features like sum-of-absolute-difference instructions and single-instruction-multiple-data operations for video compression.
A few years ago, after nearly two decades of evaluating and using embedded processors for digital-signal-processing-intensive applications, my colleagues and I at BDTI realized that embedded computer vision applications were poised to benefit from the same type of “virtuous circle” that had previously enabled the proliferation of wireless communications and video compression algorithms.
Computer vision has been around for decades in applications like factory automation. But only very recently has vision begun to be incorporated into high-volume applications like video games and automobile safety systems. And, now that vision is starting to appear in volume applications, processor vendors are beginning to focus on embedded vision applications, and to tune their processors for these applications – often by incorporating specialized coprocessors specifically designed for vision processing.
It’s easy to see why processor suppliers are excited about embedded vision applications. “Machines that see” offer compelling value in many applications and markets. Take automotive safety, for example. Over one million people are killed each year in automobile accidents. By reducing the number and severity of collisions, vision-based safety systems may be able to save many thousands of lives.
Embedded vision also promises to improve human-machine interaction—long the Achilles' heel of consumer electronics. Instead of hunting for the right hand-held remote control, imagine a world where you simply stare at your TV for a few seconds, and in response it turns itself on and offers you a personalized menu of options, which you can choose from via simple gestures. Market research firm IMS Research estimates that by 2015, vision-enabled devices will be shipping at a rate of over 3 billion units per year.(Read about many more embedded vision applications here).
In some applications, vision functions will be relatively simple and will be able to fit into existing processors (perhaps with a modest boost in clock rate or an additional core). But many of the most compelling embedded vision applications use very performance-hungry algorithms. Implementing these algorithms at low cost and low power consumption will require specialized processors. As a result, we expect to see processor suppliers introducing more processors that are optimized for vision applications, and providing more application development support (such as optimized software libraries) for these applications.
Jeff Bier is founder, Embedded Vision Alliance and president, BDTI.
I believe that vision in intelligent systems must be properly (intelligently?) integrated with the other system sensory input or it will be next to useless (as referenced by sharps) as the command cascade to action will be a ridiculous trip.
Combined vision with voice recognition, for example, would significantly reduce error in input by linking intuitive gestures with voice commands. You could then point at your TV and say "HBO" and the system would combine the two commands into the proper action. Combo input would nearly eliminate false command input because unless both gesture and statement corresponded there would be no action, and more precise action would be enabled because gestures significatnly reduce ambiguity of intent.
Safety, convenience, time saving. Oh, and play, I suppose (for example, all the iPads being sold are really for playing). I must admit that the toilet example by DrQuine struck a nerve. What's worse than the ill timed auto "royal flush?" Our robotically controlled vacuum use more clearly uses automated vision than automated trash lid (dumb sensor) but the latter is no less convenient. Keep it simple s _ _ _ _ _.
One such augmentative machine vision application was demonstrated at this year's CES -- an automotive heads-up display that highlighted (in red) critical objects in the field of vision, like a pedestrian starting to cross the street up ahead.
I think an application like this can definitely improve safety. Drivers will not abdicate their responsibility to look through the windshield, but machine vision just might alert them more quickly to things they otherwise might not notice right away.
I doubt computer vision can really improve vehicle safety unless it is always augmentative.
It is unfortunately human nature to abdicate responsibility to technology. If a vehicle has a computer vision backup alarm, drivers will soon learn to just reverse blindly until the alarm sounds. When something does go wrong and something is damaged or someone is hurt or killed, the driver's lawyers will be looking for someone to peg the blame on.
First do no harm: prevent false alarms. Everytime I walk through a New York City airport bathroom and hear the "Flushing Cheer" as each toilet in turn flushes, I realize that automation is a double edged sword. There is also no emergency stop switch - so stuck sensors waste staggering volumes of water.
Automobiles safety can be benefited from the new sensors but vision sensors are hard to justify. I do think that consumer applications in which we can use sensory input rather than touch can be huge advantageous.