HILTON HEAD ISLAND, S.C. A PC-based real-time tracking system for 3-D objects got a test run at the IEEE's Computer Vision and Pattern Recognition 2000 conference earlier this month. Others have shown special-purpose neural chips that do real-time tracking of 3-D objects, but Stefano Soatto, a professor at Washington University (St. Louis), demonstrated the first such system to run on a personal computer.
Intel Corp. (Santa Clara, Calif.) also chose the conference to announce and distribute its proposed open-source-code library of computer vision algorithms. So far the library includes more than 400 entries for everything from calibrating cameras to recognizing hand gestures.
"We think that an open-source computer vision library will provide the infrastructure for integrating computer vision into everyday applications," said Mark Holler, manager at Intel's Microprocessor Research Lab. Among the nuggets already in the library are camera calibration functions, licensed from California Institute of Technology (Pasadena, Calif.), that allow the use of a wide-angle lens to capture a large field of view and then correct for the lens distortion.
"We've also licensed a face recognition routine from Georgia Tech that should be immediately useful to researchers," said Holler. At the conference, Intel gave away 260 free CDs with the complete library, including source code and optimized compiled code for Intel's CPUs. It can be downloaded from the Intel Web site.
Stealing the show at this year's conference were live demonstrations showing off the fruits of research on tracking and rendering 3-D models. Some of the most interesting were designed to extract three-dimensional models from informal sequences of 2-D images that were shot with a video camera, such as creating a wire frame of a chess board by just waving it in front of a video camera. Both Sarnoff Corp. (Princeton, N.J.) and Geometrix Inc. (San Jose, Calif.) showed such real-time estimation algorithms, which extract multiview 3-D models from ad hoc 2-D image sequences.
Point Grey Research (Vancouver, British Columbia) showed an algorithm that tracked and counted people from a stereo-video camera setup. Another demo tracked human heads only, again with stereo vision. Another demonstrated how to create geometrically correct 3-D models of buildings by using preexisting knowledge about urban environments.
Soatto's coup was a new algorithm that makes sophisticated tracking possible on a PC.
"For a long time, researchers have been trying to track 3-D objects in real-time, but their mistake was not to require that their models be observable," he said. "Observability means that the initial condition is uniquely determined in my model by the current state."
Where previous models statistically regress from archived data, Soatto's algorithm utilizes a Kalman filter to predict the future location of feature points from their current location, as uniquely determined by the causal determinism of the observability requirement.
Soatto's algorithm addresses a classical problem called in the jargon shape-from-motion estimations. According to Soatto, almost all other purely software approaches to this problem collect their data ahead of time, so that "future" events can be regressed back to the observed initial conditions, thereby inferring a non-causal model. Real-time tracking, he said, must begin with a causal model that predicts future events, then corrects itself when the future arrives and the model's predictions are in error.
Soatto created his causal model of real-time motion by configuring the problem's parameters as a nonlinear Kalman filter. It predicts future responses from current control actions.
Soatto's algorithm assumes only that images from a video camera are collected in a relatively smooth sequence. With that scant prerequisite, he was able to prove his model reliable and robust even when objects are changing orientation or become occluded.
"We handled missing feature points by setting their variance to infinity or by merely deleting them from the Kalman filter matrix," said Soatto. This can be done because each row of Soatto's implementation of the filter matrix is decoupled from the others, thereby allowing a feature and all its past states to be simultaneously deleted.
More than 300 papers were presented at this year's conference on subjects such as tracking objects, rendering 3-D models from 2-D video cameras, retrieving unindexed images and recognizing shapes, faces and postures. The event also featured tutorials on illumination, color and motion rendering. More than 500 researchers attended.