SAN JOSE, Calif. – Computer vision still has a lot to learn, but if researchers embrace real-world challenges they could make important breakthroughs, said a government program manager.
“I have quite a bit of hope to do much more intelligent things that compare machine capabilities to those of humans,” said James Donlon, who manages the Mind’s Eye computer vision program at the Defense Advanced Research Projects Agency (DARPA).
The Mind’s Eye program aims to develop breakthrough algorithms for automatically recognizing and describing human activities. Donlon showed small steps forward—and a few bloopers—from his first 18 months of work on the three-year effort.
For example, efforts of a dozen systems failed to recognize a running dog; one described a collision between two shopping carts as “the car left.” In particular, current algorithms have difficulty detecting forearm motions that are key to activities of high interest such as giving and taking.
“Classification approaches break down horribly when you get to anything transactional,” said Donlon, in a keynote at a meeting of the Embedded Vision Alliance
, a 19-member industry group promoting mainstream computer vision applications.
The Mind’s Eye program has developed a data set of 7,676 real-life videos it has contributed to the computer vision community in an effort to challenge algorithm developers. Researchers tend to focus on major advances with existing, sometimes artificial and academic data sets.
“The Mind’s Eye data set hopefully will provide the next level of challenge the vision community needs,” Donlon said.
“The CV community is more wired to making incremental bits of progress on fixed data sets than to reward less-good results on messy [but more realistic] data sets,” he said. “Mind’s Eye can coax people out of defined data sets,” he added.
“My role is to create those conditions so you can make those breakthroughs happen,” he said, noting he deliberately put “confounding factors such as variable lighting” into his videos.
“You can’t underestimate the importance of the algorithms which have already come a long way,” said Bruce Kleinman, vice president of platform marketing of Xilinx in a panel session following the keynote.
Although the program’s focus is on enabling breakthrough algorithms, it includes system integrators providing implementations of new code in FPGAs, GPUs and SoCs. “The hardware acceleration has greatly heightened the research,” he said.
“The U.S. Army has the very sensible goal of taking a robot with a camera and have it navigate to designated points, erect a camera and stream video back,” replacing a human scout, Donlon said. “Clearly what we need to do is put the intelligence on board, on the edge of the sensor,” he said.
“We need to recognize activities currently out of reach of the technology, the verbs, the action and the narrative of the scene,” said Donlon who has identified 48 target actions to recognize. “Then we need to do operationally relevant things such as describing scenes and filling in gaps including attempts of people to fool the camera,” he said.
Donlon’s keynote was part of a day-long summit sponsored by the Embedded Vision Alliance. The summit was co-located with the Design West