SANTA CLARA, Calif. With the backing of a number of powerhouse companies, the MPEG-7 standard is expected to win ratification within weeks, paving the way for a broad range of advances in multimedia interfaces and search capabilities.
New products that could be powered by MPEG-7 include multimedia search engines for the Internet; program search and retrieval systems for personal video recorders (PVRs), set-tops or MP3 players; or mobile devices that could intelligently retrieve, manage and display only the most important and relevant information to users under bandwidth-constrained circumstances.
Major industry players involved in commercializing technologies based on MPEG-7 include IBM, Sony, Philips, Sharp, Canon, LG and Ericsson, said John R. Smith, manager of pervasive media management at IBM's T.J. Watson Research Center and MPEG chairman for the Multimedia Description Schemes Subgroup. Ratification is expected in September.
R&D arms of telecom giants such as France Telecom and NTT Docomo are also actively involved. "Content-related services are more and more in the core business of telecommunications operators," said Olivier Avaro, chief technology officer of the Hyperlanguage and Multimedia Dialogue Department at France Telecom R&D. "Multimedia search engines, content enhancement with metadata and wireless multimedia are now of strategic interest to France Telecom and will benefit from the MPEG-7 standard."
The new standard is already attracting startups, including Singingfish Inc. (Seattle), a Thomson Multimedia subsidiary that develops multimedia search engines; Minds@Work (Irvine, Calif.), a developer of a smart storage device called the Digital Wallet; and NewsTakes Inc. (Burlingame, Calif.), a developer of intelligent optimization technologies for wired and wireless devices.
MPEG-7 offers a standardized method of describing multimedia content at various levels. Higher-level descriptions include those based on catalogs (such as the content title, creator and rights) and semantics ("who, what, when and where" information about objects and events). MPEG-7 can also describe the multimedia content at a deeper level, such as the structural features (color, texture, contour, tone, melody and tempo) of audiovisual content.
MPEG-7 is "where computer science meets library science," said Corinne Jorgensen, associate professor in the School of Informatics at the Library and Information Studies Department of the State University of New York at Buffalo. She called MPEG-7 "a step in the right direction" to connect library-developed indexing thesauri based primarily on keywords and text with content-based retrieval systems from the computer science realm that use low-level descriptors automatically created by software algorithms.
"Until now, our multimedia search was limited to a text-based search," said Neil Day, who chairs the MPEG-7 Industry Focus Group. "MPEG-7 allows us to query by talking, humming or even showing a hand-drawn image to a computer."
That capability has implications for engineers designing user interfaces for next-generation computers or consumer devices, Day said. He envisions devices that could intelligently navigate and retrieve information or run complex operations instigated by spoken input.
Hurdles remain
To get there, however, some serious engineering work remains to be done. MPEG-7 tools are still in development. "Production tools tools that create MPEG-7 data in an operational environment at a low production cost are probably the biggest hurdle," said France Telecom's Avaro.
And there needs to be enough rich and compelling content indexed and tagged with MPEG-7 descriptions available on the Web or elsewhere to "justify the significant engineering costs of implementing the MPEG-7 standard in a target system," said chief technology officer Dennis Palatov of Minds@Work.
Further, engineers need to determine the processing power necessary for video analysis based on MPEG-7 description extraction. The requirement will vary based on the targeted level of analysis.
Video analysis based on MPEG-7 structural descriptions (color, texture, motion, shot detection and camera breaks) could likely be done on a desktop with a Pentium III or even lower-performance processor, said IBM's Smith. But deeper analyses for example, object recognition or detection of embedded text may involve 10x to 100x the playback time of a video clip and require the processing power of a multiprocessor machine or multiple PCs running in parallel overnight or longer. In applications calling for the ingestion of simultaneous feeds of video content and MPEG-7 metadata, the time required to extract MPEG-7 descriptions shouldn't be taken lightly, Smith said.
Schema Language
MPEG-7 has adopted the XML Schema Language as the description definition language for the standard. By building on existing technologies, the new standard tries to maintain interoperability with such other standards as W3C, Dublin Core, the Society of Motion Picture and Television Engineers Metadata Dictionary and TV-Anytime.
"MPEG-7 will be much more generic than domain-specific or application-driven standards as SMPTE and TV-Anytime," said Erik Oltmans, a member of the scientific staff at the Netherlands-based Telematica Institute. "SMPTE, for example, will not be able to classify a set of pictures; it is geared toward the movie industry. TV-Anytime designed for personal video recorders is not capable of describing musical content like MP3 databases. MPEG-7 can do both, and more."
"MPEG-7 fundamentally gives you a very good understanding of what is there in the content," said NewsTakes chief technology officer Vinod Vasudevan. If a cell-phone or PDA user requests pictures of a certain sports event, for example, MPEG-7 could intelligently select images amenable for display on a smaller screen, weeding out crowded shots in favor of images that focus on one or two players, he said.
Killer applications for MPEG-7 will "come with the exploitation of MPEG-7 data jointly with MPEG-4 to visualize the metadata for mobile applications," said France Telecom's Avaro.
Eric Rehm, chief technology officer at Singingfish, agreed. "MPEG-7 can provide just the key frames and summary of an MPEG-4 stream, because it may just be what PDA or mobile phone users are looking for," he said. "MPEG-7 is great at giving hints and suggestions for media transcoding for universal media access."
Other MPEG-7 applications include next-generation electronic programming guides for digital television, removable flash cards that store user preferences and intelligent storage media that not only store data but actively manage and interpret it.
An early example is Minds@Work's Digital Wallet, a palm-sized, 2.5-inch portable hard drive featuring a proprietary operating system and Motorola's ColdFire microprocessor. The device can store digital photos, MP3 music files, large graphics files or business presentations and can operate independently of a computer.
"A standard such as MPEG-7, when widely accepted, would enable portable smart storage devices to intelligently manage and present diverse information in accordance with users' needs," said Minds@Work's Palatov.
Hardware design
MPEG-7 will surely affect client hardware system designs. "MPEG-7 will specify conformance for the extraction and use of MPEG-7 descriptions," said IBM's Smith. "MPEG-7 descriptions will be processed in software or hardware, depending on the applications. Therefore, it is conceivable that MPEG-7-conformant hardware will be developed at the extraction side, such as an MPEG-7 camera, or at the client side, such as an MPEG-7 terminal."
The good news is that MPEG-7 builds on top of existing standards, including MPEG-1, MPEG-2, MP3 and JPEG, said Palatov. Existing hardware for processing those data types typically consists of one or more optimized digital signal processors running proprietary microcode, under the supervision of one or more general-purpose RISC processors.
Since video analysis is computationally expensive, in most cases the client will not extract the descriptions from the video, said IBM's Smith. "However, given the need in some applications to process live video feeds in real-time, there will be an opportunity for MPEG-7 chips that operate in conjunction with hardware video encoders to create and insert MPEG-7 annotations into the encoded video streams," he said. "This real-time indexing will greatly speed up turnaround time from acquisition to searching."
The standardization of MPEG-7 only heralds the beginning of cross-industry efforts to build standardized multimedia databases. Telematica Institute's Oltmans said that metadata extraction of multimedia content "can be done automatically, semi-automatically and manually. Even in cases of manual tagging, tools must be developed to accommodate this.
"But more important, tools must be there that can automatically do the hard work, such as segmentation of video into shots and scenes, face detection, colors, text processing subtitles, speech-to-text, etc. Many of these kinds of tools are in development; some are already there. But the bulk is yet to come."
Minds@Work's Palatov sees the biggest hurdle in MPEG-7 in "the availability of rich and compelling content that would justify the significant engineering costs of implementing the standard in a target system. We expect that this will be addressed first in focused, niche markets. . . . General availability is likely to follow a couple of years later."
The MPEG-7 Industry Focus Group is holding an MPEG-7 Awareness Event in Washington, D.C., in October.