NEW YORK – The son of Shooter in the film, "Hoosiers," said to Gene Hackman, "Coach, I'm not seeing it."
I know how the kid felt. I've always had the same problem distinguishing —in technology terms — between "vision" and "video."
I know, of course, what an engineer would explain: “vision” technology often requires a host of algorithms that enable a machine or computer to detect, classify, and track objects. Meanwhile, “video” involves software and hardware primarily designed to process video by filtering, pre- and post-processing, encoding and decoding (for the purpose of transmission, broadcasting or packaging), and ultimately displaying good moving pictures on a screen.
Nonetheless the connection remained shaky in my mind until I sat down last month -- at a San Jose cafe -- with Bruce Tannenbaum, a MathWorks engineer who specializes in image processing and test & measurement applications. We started talking about embedded vision – one of the hottest topics in the embedded system industry in recent years.
It was Tannenbaum who connected the dots for me on video and vision, first on a personal level.
During our introduction, Tannenbaum, who had previously worked at places like Oak Technology and Sarnoff Research Center, said, “I used to read a lot of your MPEG articles in EE Times.” I already felt as though I'd re-conncected with an old friend — and this was before we realized that we'd graduated from the same high school in West Hartford, Connecticut.
But I digress.
Video was one of my first beats at EE Times. I was fascinated by an emerging digital compression technology called MPEG. I learned about a group of engineers from diversified fields literally dissecting a video image, one frame at a time, looking at macroblock – usually composed of two or more blocks of pixels – and figuring out a way to compress the video stream.
MPEG, then, was a big idea. I made it my beat, my business and my passion.
Engineers in computers, communications, consumer electronics and entertainment (film studios) were all hot for MPEG, striving to develop something that could bridge their industries. This was the beginning of a digital movement about to sweep up everyone, triggering the unintended consequence of bringing down the traditional walls between different industries, their business models and rules of engagement.
Well, that was video.
Vision promises even more sweeping changes. Recent technology advancements allow a system’s ability to “see” much more effectively. Further, embedded vision will be ubiquitous. “Vision” can go into cars, game consoles, smartphones, homes, street cameras, the Mars Rover —you name it.
As Tannenbaum put it, computer vision is all about “recognizing” an object (and its action), and “interpreting” it.
Computer vision, however, is a relatively new field. Tannenbaum said that textbooks and conferences built around computer vision have only begun popping up in the last 10 years.
But wait. Hasn’t machine vision been around for awhile? You know, the kind you see on a factory floor, inspecting an action taken by a robot on a production line? “Ah, but that’s different from computer vision,” said Tannenbaum. “Machine vision is about seeing things under a controlled environment.” Computer vision is about applying vision under real-world conditions.
Tannenbaum explained, “Real-world scenes are complex.” It involves variable lighting conditions, background clutter, partially hidden objects, unknown scene depth and differences in object scale, location, and orientation. The list goes on. As a result, computer vision requires far more diversified detection algorithms and more complex computation.
This is where the concept of a “feature” comes in.
In computer-vision lingo, a feature is defined as an “interesting” part of an image. Features are the starting point for many computer-vision algorithms. For computer vision to detect a feature, it requires abstractions of image information. Computer vision needs to make a local decision at every image point about whether it "sees" an image feature.
Because that “interesting” part of an image really depends on a specific problem or a type of application to which computer vision is applied, there's no one-size-fits-all “feature detector.” So, loads of feature detectors have been developed for computer vision. They vary widely in the kinds of features they detect, the computational complexity they require and the repeatability they offer.
A lot of ways to skin a cat.
Tannenbaum, however, is not discouraged. That’s why “features” are critical to computer vision, he noted. While it’s not perfect yet, computer vision strives to find a given object in an image or video sequence by using features like edge detection, corner detection and template matching. Computer vision also deploys methods like MSER (maximally stable extremal regions) for blob detection, and SURF (speeded up robust feature) for a robust image detector and descriptor to do object recognition or 3D reconstruction.
Tannenbaum explained that MathWorks offers engineers a computer-vision system toolbox that lets them design and simulate computer vision. The toolbox, obviously, includes things like feature detection and extraction, registration and stereo vision and object detection and tracking.
When going over the list of what’s inside the toolbox, I belatedly realized that there are a few “video” fundamentals – like motion estimation and video processing – that are also useful in “vision.” When I asked Tannenbaum if anything we learned in MPEG applies to computer vision, he noted that integral image in computer vision, for example, could use DCT.
But what further helped me connect the dots between vision and video are the challenges computer vision faces. Computer vision needs to be embedded in diversified applications – from defense to consumer electronics – and to meet each industry’s requirements. Chip companies betting on the market growth of computer vision must figure out the most efficient way to cater to them, rather than just throwing a lot of mips at the problems.
While computer vision finds no compelling reasons to interoperate among applications a la MPEG (there's no reason why automotive vision should talk to embedded vision in a game console, for example), computer vision further needs to sort out features, looking for common ground, if any, among detection methods.
While a great number of feature-detection algorithms have been developed over time, the industry has only begun to understand “what actually works” in the last two or three years, said Tannenbaum.
The computer-vision industry is still at its dawn, as the engineerings working for tool vendors ( i.e. MathWorks), or DSP, FPGA and video processor companies (i.e. Texas Instruments, Analog Devices, Xilinx and others), and those who make embedded systems have begun communicating among themselves and with the academic community.
As a reporter and an industry observer, there is no more exciting time for me than the beginning of one of these "revolutionary" shifts. Just as MPEG brought big changes to many industries, I know computer vision is already altering the way our everyday “embedded systems” (cars, smartphones, Mars Rovers, etc.) see the world.Related stories
Is 'vision' the next-gen must-have user interface?
DARPA seeks breakthroughs in computer vision
Analog Devices to launch dual-core Blackfin with a vision accelerator