SAN JOSE, Calif. — The buzz around big data is spawning new algorithms, programming languages, and techniques at the speed of software.
“Neural networks have been around for a long time. What’s new is the large amounts of data we have to run against them and the intensity of engineering around them,” said Inderpal Bhandari, a veteran computer scientist who was named IBM’s first chief data officer.
He described work using generative adversarial networks to pit two neural nets against each other to create a better one. “This is an engineering idea that leads to more algorithms — there is a lot of that kind of engineering around neural networks now.”
In some ways, the algorithms are anticipating tomorrow’s hardware. For example, quantum algorithms are becoming hot because they “allow you to do some of what quantum computers would do if they were available, and these algorithms are coming of age,” said Anthony Scriffignano, chief data scientist for Dun & Bradstreet.
Deep belief networks are another hot emerging approach. Scriffignano describes it as “a non-regressive way to modify your goals and objectives while you are still learning — as such, it has characteristics of tomorrow’s neuromorphic computers,” systems geared to mimic the human brain.
At Stanford, the DeepDive algorithms developed by Chris Ré have been getting traction. They help computers understand and use unstructured data like text, tables, and charts as easily as relational databases or spreadsheets, said Stephen Eglash, who heads the university’s data science initiative.
“Much of existing data is un- or semi-structured. For example, we can read a datasheet with ease, but it’s hard for a computer to make sense of it.”
An example of a DeepDive program making sense of unstructured data for a knowledge-based construction. (Image: Ce Zhang, University of Wisconsin)
So far, Deep Dive has helped oncologists use computers to interpret photos of tumors. It’s being used by the New York attorney general as a law enforcement tool. It’s also in use across a large number of companies working in different domains.
DeepDive is unique in part because “it IDs and labels everything and then uses learning engines and probabilistic techniques to figure out what they mean,” said Eglash.
While successful, the approach is just one of many algorithm efforts in academia these days. Others focus on areas such as computer vision or try to ID anomalies in real-time data streams. “We could go on and on,” said Eglash.
Next page: Hands on with interesting data sets