News & Analysis
Comment
pernixxx
t.alex
I believe the system really need to 'learn' somehow to be able to build up big ...
DARPA seeks breakthroughs in computer vision
Rick Merritt
3/29/2012 6:11 PM EDT
SAN JOSE, Calif. – Computer vision still has a lot to learn, but if researchers embrace real-world challenges they could make important breakthroughs, said a government program manager.
“I have quite a bit of hope to do much more intelligent things that compare machine capabilities to those of humans,” said James Donlon, who manages the Mind’s Eye computer vision program at the Defense Advanced Research Projects Agency (DARPA).
The Mind’s Eye program aims to develop breakthrough algorithms for automatically recognizing and describing human activities. Donlon showed small steps forward—and a few bloopers—from his first 18 months of work on the three-year effort.
For example, efforts of a dozen systems failed to recognize a running dog; one described a collision between two shopping carts as “the car left.” In particular, current algorithms have difficulty detecting forearm motions that are key to activities of high interest such as giving and taking.
“Classification approaches break down horribly when you get to anything transactional,” said Donlon, in a keynote at a meeting of the Embedded Vision Alliance, a 19-member industry group promoting mainstream computer vision applications.
The Mind’s Eye program has developed a data set of 7,676 real-life videos it has contributed to the computer vision community in an effort to challenge algorithm developers. Researchers tend to focus on major advances with existing, sometimes artificial and academic data sets.
“The Mind’s Eye data set hopefully will provide the next level of challenge the vision community needs,” Donlon said.
“The CV community is more wired to making incremental bits of progress on fixed data sets than to reward less-good results on messy [but more realistic] data sets,” he said. “Mind’s Eye can coax people out of defined data sets,” he added.
“My role is to create those conditions so you can make those breakthroughs happen,” he said, noting he deliberately put “confounding factors such as variable lighting” into his videos.
“You can’t underestimate the importance of the algorithms which have already come a long way,” said Bruce Kleinman, vice president of platform marketing of Xilinx in a panel session following the keynote.
Although the program’s focus is on enabling breakthrough algorithms, it includes system integrators providing implementations of new code in FPGAs, GPUs and SoCs. “The hardware acceleration has greatly heightened the research,” he said.
“The U.S. Army has the very sensible goal of taking a robot with a camera and have it navigate to designated points, erect a camera and stream video back,” replacing a human scout, Donlon said. “Clearly what we need to do is put the intelligence on board, on the edge of the sensor,” he said.
“We need to recognize activities currently out of reach of the technology, the verbs, the action and the narrative of the scene,” said Donlon who has identified 48 target actions to recognize. “Then we need to do operationally relevant things such as describing scenes and filling in gaps including attempts of people to fool the camera,” he said.
Donlon’s keynote was part of a day-long summit sponsored by the Embedded Vision Alliance. The summit was co-located with the Design West conference here.
“I have quite a bit of hope to do much more intelligent things that compare machine capabilities to those of humans,” said James Donlon, who manages the Mind’s Eye computer vision program at the Defense Advanced Research Projects Agency (DARPA).
The Mind’s Eye program aims to develop breakthrough algorithms for automatically recognizing and describing human activities. Donlon showed small steps forward—and a few bloopers—from his first 18 months of work on the three-year effort.
For example, efforts of a dozen systems failed to recognize a running dog; one described a collision between two shopping carts as “the car left.” In particular, current algorithms have difficulty detecting forearm motions that are key to activities of high interest such as giving and taking.
“Classification approaches break down horribly when you get to anything transactional,” said Donlon, in a keynote at a meeting of the Embedded Vision Alliance, a 19-member industry group promoting mainstream computer vision applications.
The Mind’s Eye program has developed a data set of 7,676 real-life videos it has contributed to the computer vision community in an effort to challenge algorithm developers. Researchers tend to focus on major advances with existing, sometimes artificial and academic data sets.
“The Mind’s Eye data set hopefully will provide the next level of challenge the vision community needs,” Donlon said.

“The CV community is more wired to making incremental bits of progress on fixed data sets than to reward less-good results on messy [but more realistic] data sets,” he said. “Mind’s Eye can coax people out of defined data sets,” he added.
“My role is to create those conditions so you can make those breakthroughs happen,” he said, noting he deliberately put “confounding factors such as variable lighting” into his videos.
“You can’t underestimate the importance of the algorithms which have already come a long way,” said Bruce Kleinman, vice president of platform marketing of Xilinx in a panel session following the keynote.
Although the program’s focus is on enabling breakthrough algorithms, it includes system integrators providing implementations of new code in FPGAs, GPUs and SoCs. “The hardware acceleration has greatly heightened the research,” he said.
“The U.S. Army has the very sensible goal of taking a robot with a camera and have it navigate to designated points, erect a camera and stream video back,” replacing a human scout, Donlon said. “Clearly what we need to do is put the intelligence on board, on the edge of the sensor,” he said.
“We need to recognize activities currently out of reach of the technology, the verbs, the action and the narrative of the scene,” said Donlon who has identified 48 target actions to recognize. “Then we need to do operationally relevant things such as describing scenes and filling in gaps including attempts of people to fool the camera,” he said.
Donlon’s keynote was part of a day-long summit sponsored by the Embedded Vision Alliance. The summit was co-located with the Design West conference here.
Navigate to related information


JeffCB
3/29/2012 6:36 PM EDT
A video of Jim Donlon's presentation will be available next week at www.Embedded-Vision.com. To get an alert when it becomes available, register on the site and request the newsletter.
Sign in to Reply
Luis Sanchez
3/29/2012 9:37 PM EDT
All these sounds quite interesting but also quite difficult. The human brain learns through years a child develops. Teaching that to a computer sounds to me as a gigantic task. And add to this the fact that we don't fully understand how the human brain works. The brain is a difficult subject to study as we only have brains to study it. It's a lot easier when the system under study is less complex than the tool or system being used for studying it.
Sign in to Reply
DrQuine
3/29/2012 10:00 PM EDT
A factor to consider is the nature of the object being carried by one person towards another. Holding a wrapped gift out in front as one person approaches other suggests the possibility of a transfer (gift giving). The facial expression and degree of engagement with that person might as well. In contrast, a person with a computer case over their shoulder or a wedding ring on their hand would not be expected to be about to give those items to another individual they were approaching.
Sign in to Reply
Neo1
3/30/2012 1:57 AM EDT
I thought the makers if Kinect from MS had already done some significant breakthroughs in this area or else how can it sense users movements in a game.
Sign in to Reply
rick.merritt
3/30/2012 2:33 AM EDT
Yes, Kinect is often sited as the first major proof point embedded vision has gone mainstream. But the computer's "understanding" of and ability to intelligently act on what it "sees" is still very rudimentary.
Sign in to Reply
kinnar
3/30/2012 5:13 AM EDT
Computer vision is at a very good stage compared to the time a decade before, and it has gained this stage at a very fast increasing rate. But still if some one think out of box then he will surely get setback as computer vision has its own limitation and one can use it considering those limitations. Yes there is no doubt that there is enormous scope of improvement and that will take place in coming time as it is the area of interest of many researchers.
Sign in to Reply
agk
3/30/2012 6:51 AM EDT
To reach the machine capabilities to those of humans we need to integrate all the five senses of the human to the machines. Then it will be attractive and use full in a better way.With a video camera We need mic with sound processing,smell analysis,skins feeling analysis,taste analysis extras to make a decision about the current scenario. A robot with full of sensors and a computing system with AI can make a robot close near to humans.
Sign in to Reply
t.alex
3/30/2012 10:44 AM EDT
I believe the system really need to 'learn' somehow to be able to build up big database for recognition.
Sign in to Reply
pernixxx
9/22/2012 7:51 AM EDT
The ALIEN Visual Tracker application IS OUT!
Download it here: http://www.micc.unifi.it/pernici/
(available for Windows7 64bit).
The ALIEN visual tracker is a generic visual object tracker achieving state of the art performance. The object is selected at run-time by drawing a bounding box around it and then its appearance is learned and tracked as time progresses. The ALIEN tracker has been shown to outperform other competitive trackers, especially in the case of long-term tracking, large amount of camera blur, low frame rate videos and severe occlusions including full object disappearance.
The scientific paper introducing the technology behind the tracker will appear at the 12th European Conference in Computer Vision 2012 under the following title:
• FaceHugger: The ALIEN Tracker Applied to Faces. In Proceedings of European Conference on Computer Vision (ECCV) - DEMO Session -- 2012 Florence Italy.
A real time demo of the released downloadable application (http://www.micc.unifi.it/pernici/) will also be given during the conference [1].
Video demos showing the capability of this novel technology may be seen here http://www.youtube.com/user/pernixVision.
Sign in to Reply