Portland, Ore. - A group at the University of Calgary (Alberta) has developed automatic recognition and tracking software that could "watch" a sports event for you, keeping track of who did what when, and diagrammatically represent the highlights or even the live action using icons instead of raw video. The team, led by computer science professor Jeffrey E. Boyd, will soon test out its software and client-server architecture on an actual hockey rink.
"Once we extract the moving objects in a scene, we can transform that data and present it in all sorts of formats," Boyd said. "We can even make a little schematic of a sports game, which is good for viewing on a small handheld device like a cell phone."
On handsets or other systems that "don't have enough bandwidth for live video," he said, "we can show a moving schematic or diagram of the action." Boyd was assisted in the work by master's candidate Michael Zhang and undergraduates Luke Olsen and Maxwell Sayles.
Boyd says his work builds on research conducted with colleagues at the University of British Columbia, Dalhousie University and Waterloo University that began many years before the MPEG-7 spec was formalized in September 2001. But Boyd's Camera Markup Language shares a common base with MPEG-7: Both append an XML document to a video that describes its content.
Boyd's software "watches" a video stream while generating a running commentary in the form of a continuous XML document. For instance, if a "white ball" enters the scene, Boyd's software will find it when it segments the scene into objects. The software names the object, then tracks it in a stream of XML statements about its color, size, location and trajectory.
But while this kind of operation could be performed with the standard MPEG-7 format, the Camera Markup Language goes further by allowing two-way bidirectional communication between the video server running the software and the camera. An MPEG-7 camera will produce a stream of XML documents describing the video being taped, but Boyd's Camera Markup Language can send XML documents back to the camera to control it.
"MPEG-7 is primarily a one-way description; you have some video, you describe it and that's it," said Boyd. "We were looking more at interacting with the camera. As the video is being produced we can tell the camera to do different things. For instance, if the camera is on a pan-tilt head, we can send documents to the camera and the pan-tilt will respond to our documents by moving."
In this server-client architecture, servers can process the video and its XML documents describing the objects and generate both video representations (for TV clients) and a graphical representation (for cell phone clients). Servers can also send XML commands back to the camera for operations such as "follow the ball." And even if the ball stops moving, the server can command the camera to switch algorithms. "We can send documents to the camera to track objects with optical flow until they stop moving, then we can switch to adaptive background subtraction to identify objects that stop," said Boyd.
Boyd's software is restricted to tracking discrete objects, but in the future he hopes to enable more-complex descriptions that can discern the activities in which the objects are participating. For instance, security applications today depend on a human operator to identify "suspicious" people and to switch between cameras to track their activities. Boyd's software could do this on the server, send directives back to cameras to do the tracking and transmit a graphical icon to the operator to identify the suspicious activity in progress.
"For instance, a security camera client takes video feeds from all the surveillance cameras and automatically stitches the disparate pieces into a complete picture," said Boyd. "We found we only really needed to know when an object crosses over into a different region. Based on that notion, we built a client that carves up a scene under surveillance into a bunch of regions and then, using information from the camera, determines whenever something crosses the boundaries between regions."
Currently, Boyd's demonstrations for sports use a 1:32 scale model of a hockey rink with little plastic players. Next, he wants to wire up a real hockey rink with his client-server architecture. Luckily, the University of Calgary is home to a stadium with two Olympics regulation hockey rinks that were built for the 1988 winter games.
"If we had the funding, we would put up dozens of cameras in the Olympic Stadium, but so far we can only afford to install three cameras with pan-tilt zoom mounts," said Boyd. "This will become our real testbed. By August or September we will be viewing live action from there."
Currently, Boyd's software is tracking cars in traffic in Calgary (see toaster.cpsc.ucalgary.ca/CaML), but as soon as the Olympic Stadium is wired up in the fall, he will move the live feed there. After that, Boyd plans to develop techniques for archiving the video with its associated XML annotations.
"We have collaborators on our project working on video databases," he said. "When you archive you have the chance to do queries and searches, and if it's surveillance or security video, you could mine the data."
The work was funded by the Institute for Robotics and Intelligence Systems, part of Canada's network of Centres of Excellence. Boyd is working with Calgary's University Technologies International Inc. to market his system.