PORTLAND, Ore. ó If you wonder what the government has done for you lately, take a look at DeepDive. DeepDive is a free version of IBM's Watson developed in the same Defense Advanced Research Projects Agency (DARPA), but now available free and open-source.
Although it's never been pitted against IBM's Watson, DeepDive has gone up against a more fleshy foe: the human being. Result: DeepDive beat or at least equaled humans in the time it took to complete an arduous cataloging task. These were no ordinary humans, but expert human catalogers tackling the same task as DeepDive -- to read technical journal articles and catalogue them by understanding their content.
"We tested DeepDive against humans performing the same tasks, and DeepDive came out ahead or at least equaled the efforts of the humans," professor Shanan Peters, who supervised the testing, told EE Times.
DeepDive is free and open-source, which was the idea of its primary programmer, Christopher Re.
"We started out as part of a machine-reading project funded by DARPA in which Watson also participated," Re, a professor at the University of Wisconsin, told EE Times. "Watson is a question-answering engine (although now it seems to be much bigger). [In contrast] DeepDive's goal is to extract lots of structured data" from unstructured data sources.
DeepDive incorporates probability-based learning algorithms as well as open-source tools such as MADlib, Impala (from Oracle), and low-level techniques, such as Hogwild, some of which have also been included in Microsoft's Adam. To build DeepDive into your application, you should be familiar with SQL and Python.
DeepDive was developed in the same Defense Advanced Research Projects Agency (DARPA) program as IBM's Watson, but is being made available free by its programmers at University of Wisconsin-Madison.
Click to see larger image.
(Image: University of Wisconsin-Madison)
"Underneath the covers, DeepDive is based on a probability model; this is a very principled, academic approach to build these systems, but the question for use was, 'Could it actually scale in practice?' Our biggest innovations in Deep Dive have to do with giving it this ability to scale," Re told us.
For the future, DeepDive aims to be proven in other domains.
"We hope to have similar results in those domains soon, but it’s too early to be very specific about our plans here," Re told us. "We use a RISC processor right now, we're trying to make a compiler, and we think machine learning will let us make it much easier to program in the next generation of DeepDive. We also plan to get more data types into DeepDive: images, figures, tables, charts, spreadsheets -- a sort of 'Data Omnivore' to borrow a line from Oren Etzioni."
Get all the details in the free download, which are going at 10,000 per week.
— R. Colin Johnson, Advanced Technology Editor, EE Times