SAN JOSE, Calif. — You could say that big data got its start when Sergy Brin and Larry Page helped develop an algorithm that found more relevant results on the web than the search engines of their rivals. The lesson of Google continues to ripple through all businesses seeking competitive insights from their data pools, however large or small.
Today, the Internet of Things is opening vast new data sources, expanding big data’s promise to reshape business, technology, and the job of the technologist. Along the way, big data is inspiring new kinds of processor and systems architectures, as well as evolving algorithms and programming techniques.
“The concept of being overwhelmed by data is the new normal,” said Anthony Scriffignano, chief data scientist of Dun & Bradstreet, at a recent event hosted by the Churchill Club.
Inderpal Bhandari, the first chief data officer of IBM, also spoke at the event. The goal of the new role is to “change major processes an enterprise has so that their outcomes are better, so faster and better decisions get made,” said Bhandari.
Some of the largest recent IPOs in tech are being fueled by big data. They include Cloudera and Hortonworks, who helped drive Hadoop, an open-source equivalent of Google’s core MapReduce algorithm.
At Stanford’s Data Science Initiative, researchers are working the big-data techniques in the hands of the average company.
“Machine learning is impressive but really hard to use. Even the most sophisticated companies might only have a couple of people that can apply those techniques optimally,” said Stephen Eglash, executive director of the initiative. “I can imagine the day when these tools are available in the equivalent of Microsoft Office.”
To get there, Stanford researchers are developing Snorkel, a tool to automate the process of labeling and ingesting big data sets. “It’s far enough along that you can see that it will work,” said Eglash. “We want the domain experts to use these techniques without needing a computer science expert.”
The IEEE Big Data Initiative is taking a different approach, making large data sets freely available for research through its Dataport service. So far, they include examples as diverse as real-time feeds of New York City traffic and neuron movements in a human brain.
Commercial big data projects are just as diverse, says Wayne Thompson, chief data scientist at SAS, a data analytics pioneer founded in 1976. “We are working with a semiconductor company to help reduce defects in their chip fab process through improved computer vision. One of our development partners is applying deep learning to help improve soccer players’ performance. We also are applying deep learning to monitor and count endangered wildlife through footprint image analysis and tracking/”
Smaller companies are getting traction, too. Although it has just 150 people, Real-Time Innovations Inc. (RTI) claims more than 1,000 design wins for its novel databus software for real-time monitoring. It uses a subscribe-and-publish model for tracking nodes, typically on sensor networks.
One of its first big users was a middleware server installed in the U.S.S. Cole after it was bombed in the Middle East. The software is also used in many hydroelectric plants, including Grand Coulee Dam, hospital instruments built by GE Healthcare, and wind turbine farms operated by Siemens.
The company recently named former Sun Microsystems co-founder, Scott McNealy, to an advisory board that will help it scale. RTI’s business “is the next evolution of what we used to describe as ‘the network is the computer,’” said McNealy. “Today, the network is also the power grid and a lot of other things.”
Next page: Data scientists in high demand