San Jose, Calif. About five exabytes of unique analog and digital information were produced worldwide in 2002, twice the amount produced in 1999. That's a data explosion the equivalent of half a million libraries the size of the Library of Congress' print collection, according to a study from the University of California, Berkeley.
"The surprising thing is growth in unique data was actually slower than we anticipated," said Hal Varian, a professor at the Berkeley School of Information Management and Systems, which conducted the study this summer. "We were anticipating we would see something like 50 percent growth in generating unique information, not the 30 to 33 percent we now estimate. I think that's due to the economic slowdown," Varian said.
Growth in new information generally occurs at a slower rate than growth in the capacity of hard-disk drives, Varian said. Magnetic storage, mainly on hard-disk drives, saw the greatest growth of all the print, film, magnetic and optical sources measured or estimated, capturing about 92 percent of the new information.
The study, now online (see www.sims.berkeley.edu/research/projects/how-much-info-2003), generally measured new, unique information, not copies of data. It was a major update sponsored by EMC, Hewlett-Packard, Intel and Microsoft Research of the group's original, 1999 report.
The study estimated that audio CDs saw no growth as a medium for storing new information. Optical storage overall was up 28 percent, however, with storage of new data up 57 percent on CD-ROMs and 99 percent on DVDs. The study said growth of data sharing on peer-to-peer data services was significant with one service alone, KaZaa, which was responsible for sharing as much as 5,000 terabytes in 2002.
The university estimated annual flows of unique data at about 17.9 million terabytes. Voice-only telephone traffic accounted for the vast majority of those flows, at an estimated 17.3 million terabytes a year. By contrast, data flows over the Internet were estimated at about 532,000 terabytes/year, with remaining flows coming from television and radio.
The 17.3 million terabytes of phone traffic breaks down into an estimated 15.3 million terabytes of wired calls and 2.3 million terabytes of wireless calls. The study estimated 3,785 billion minutes of wired calls, compared with 600 billion minutes of wireless calls.
The 532,000 terabytes of Internet traffic in 2002 consisted mainly of e-mails (about 440,000 terabytes), followed by database queries (91,000 terabytes). The remaining traffic consisted mostly of accesses of static Web pages (167 terabytes) and instant messaging (270 terabytes).
The numbers of Internet users were fairly evenly split among Europe (190 million), the Asia-Pacific region (187 million) and the United States and Canada (182 million). The United States produced about 40 percent of all the new data generated in 2002, however, and about half the new data stored on magnetic media.
See related chart