CHICAGO The National Center for Data Mining has launched a new protocol designed to reclaim the Internet's original purpose sharing data. The Data Space Transfer Protocol (DSTP), developed at the center and at the University of Illinois, hopes to do for standardized data warehousing what the Hypertext Transfer Protocol (HTTP) did for Web browsing.
Already, six major data havens in the United States, England and Australia have reformatted their online databases to demonstrate the power of DSTP's distributed data protocol, sponsors said.
"The Internet is very powerful for sharing text, images and sounds, but it has fallen behind in its original purpose that is, exchanging raw data between researchers," said Georg Reinhart, a visiting research scientist in mathematics, statistics and computer science at the UI at Chicago. "Our infrastructure, called the Data Space Transfer Protocol, enables the next generation of Web data mining."
The overwhelming success of HTTP has turned the Internet into little more than a means of distributing multimedia documents, Reinhart said. Using HTTP exclusively, the Web now houses massive amounts of online data. But since HTTP does not provide a format for data, everybody has made up their own incompatible Web page formats, thereby making it impossible to transparently share engineering research data.
DSTP, developed with Emory Creel at the National Center for Data Mining, aims "to unify the way data is stored online, thereby enabling multiple separate databases to be mined by researchers without their having to do anything but formulate their queries," said Reinhart.
With more and more researchers gaining access to broadband backbone interconnections, pent-up demand exists for such a spec. Cooperating researchers have always been able to translate data among various formats and share their research results but only after long downloads and reformatting.
With more and more researchers putting their data online, a common format would enable everyone to query the same worldwide database in real-time, without the slow downloads and format translations.
"DSTP standardizes the way raw data is shared, the same way HTTP revolutionized the way hypertext documents are shared," said Reinhart. "Researchers can search, analyze and draw conclusions from multiple databases simultaneously in real-time even if the databases contain different types of data."
He predicts DSTP will motivate widely scattered and unrelated researchers worldwide to post their data in a globally accessible format. A demonstration of the standard can be found at www.dataspaceweb.net, where it is possible to download a free client/server software package (available for Linux, Macintosh, Unix and Windows). In addition, there is an online demonstration using six widely separated databases in DSTP format.
DSTP standardizes the manner in which data is distributed, queried and retrieved by providing a "data column" template permitting correlation studies among databases without relocating or translating them. Data columns that are widely separated on different servers can be temporarily unified with a specific Universal Correlation Key. Like the primary key attribute in a traditional database, this key is unique and thus can unify distributed databases under the direction of a DSTP server.
The DSTP's closest kin is perhaps the Network News Transfer Protocol (NNTP), which permits news articles on different servers to be uniformly presented in a common query window. Software marrying the DSTP client to the DSTP server uses commands modeled after NNTP. In particular, a DSTP server links a client program with diverse data columns from different databases, but does not relocate the raw data. It remains in its original location and does not need to be transferred.