PORTLAND, Ore.—Exascale supercomputers running a thousand times faster than today's petaflop machines will require newer performance measures, according to Sandia National Laboratories, which announced a 30 member committee effort to define a new standard with Intel, IBM, AMD, NVIDIA, and Oracle. Called Graph 500, a preliminary specification has already been made available to supercomputer makers for testing.
Graph 500 differs from the traditional Linpack by testing a supercomputer's skill at using graph theory to analyze the output streams from simulations in biological, security, social and similar large-scale problems. Graph 500 not only measures the traditional number crunching ability of supercomputers, but also their ability to shuttle around the very-large data sets represented by future supercomputers like those being addressed by the U.S. Department of Energy's exascale supercomputer initiative.
"Today's Linpack benchmark focuses on compute-intensive work, but we are looking at data intensive things," said Sandia researcher Richard Murphy.
The major problem today with very-large data sets is automating their analysis, which can sometimes take months longer than the actual collection and processing of the data. The theory is that the analysis of very-large simulations requires relatively simple calculations performed on select elements found in sparsely populated arrays representing huge numbers of participants. A three-tiered architecture is being proposed to partition the benchmarking problem into parallel executing graphs structures—one kernel handles linking sets of related members while a second performs parallel searches for the most relevant results from these graphs. And a third kernel is in the works.
Synthetic graph was generated by a method called Kronecker multiplication. Larger versions of this generator, modeling real-world graphs, are used in the Graph500 benchmark. (Courtesy of Jeremiah Willcock, Indiana University)
"The goal is to have some influence over how industry puts together supercomputers," said Murphy. "The shift to multicore processor architectures really is stressing the existing programming model. In ten years we will have a very different way of thinking about how the machine works," he said.
Typical problems amenable to analysis by graph theoretic means include cybersecurity, medical informatics, data enrichment, social- and symbolic-networks. For instance, cybersecurity routinely requires full scans of data sets with as many as 15 billion log entries per day. Medical informatics routinely analyses up to 50 million patient records requiring resulting in billions of items. Data enrichment applications, such as maritime domain awareness, can include hundreds of millions of individual transponders on tens of thousands of ships and tens of millions of individual cargo containers. Social and symbolic networks include petabyte sized data sets—for instance, the human cortex has 25 billion neurons each with approximately 7,000 connections.