Portland, Ore. Software that enables semantic metadata and natural language queries has been released by IBM Corp. as an open-source product. The Unstructured Information Management Architecture automatically annotates any type of metadata like text, images, audio and video, and associates semantic information about that data's meaning, allowing hidden relationships to be uncovered among facts. It also enables users to query with natural language rather than with a strict syntax.
"Our open-source initiative enables software development that leverages unstructured information," said Arthur Ciccolo, department group manager for information and knowledge management at IBM Research. "The Unstructured Information Management Architecture provides interoperability among analytic software for searching, knowledge discovery, business intelligence and text."
The UIMA is part of a trend in semantic software research that automatically annotates databases with metainformation about the meaning of the data. While ordinary data searches must specify syntax exactly, such as precisely spelling the text string "George W. Bush," the UIMA allows users to conduct semantic searches such as "current president."
"We saw the potential of UIMA to help bring together and multiply the effects of the work of a large community of researchers, and are excited to see the strong momentum and support," said Ronald Brachman, director of the Defense Advanced Research Projects Agency's Information Processing Technology Office. The UIMA was developed with funding from Darpa. "Having an open-source framework for deploying text analysis components will help us deliver more advanced solutions for the national security community."
After IBM announced UIMA's release as open source, 16 software vendors promised compatibility, including Attensity, ClearForest, Cognos, Endeca, Factiva, Kana, InQuira, iPhrase, Inxight Software, Nstein Technologies, QL2 Software, SAS, SchemaLogic, Semagix, SPSS and Temis.
The UIMA is based on software robots that comb through documents and annotate semantic information about the meaning and relationship among the concepts contained therein. Technically, the UIMA calls these software robots Analysis Engines, which are constructed from building blocks called Annotators. The Annotators contain the analysis logic that infers the meaning of information, which is included in a document's metadata.
IBM also described, as an example, an online automobile diagnostic system that first used the UIMA. Using a tool from ClearForest, a software robot first extracted facts related to automotive problems by scanning warranty claims, maintenance records, repair requests and call center logs. Then a reporting tool from Cognos used the UIMA to access the metadata mined by ClearForest and discovered previously undetected correlations, thereby providing faster, cheaper car repairs than was previously possible.
By providing an open-source environment for annotating, searching and sharing metadata, Darpa and IBM claim that the UIMA enables a new breed of software application that can uncover relevance that would ordinarily be buried within documents.
UIMA is already in pilot use at Carnegie Mellon, Columbia and Stanford universities, the University of Massachusetts, Science Applications International, BBN Technologies, the Mayo Clinic and Mitre.