Thigpen said he has observed a growing gap between the rate at which benchmark performance is rising and the in- creases in actual work delivered by new machines. For example, NASA's next-generation supercomputer--a 245-Tflops system called Pleiades, based on quad-core Intel Xeon processors--has twice the theoretical performance but handles only 1.5 times the actual work of NASA's current top system, the 89-Tflops Columbia, based on Intel's Itanium CPU.
NASA's problem is similar to that of Oak Ridge: scaling software to a dramatically rising number of cores.
"Communications becomes a bigger part of your work," Thigpen explained. "If you spend increasing time passing information between the processors, the processors are not doing as much work on the real issue."
The good news for NASA is that the more powerful Pleiades system actually costs less than the older Columbia system did, and at about a megawatt, it consumes a little less than half the power.
Instead of looking at synthetic benchmarks such as the Linpack test used for the Top 500 list, NASA ran its own applications on test systems as part of its Pleiades evaluation.
"In the past, processors like IBM's Power and Intel's Itanium ran our codes best," Thigpen said. "It wasn't until we got to the quad-core chips that the X86 [Xeon] started doing better on our workloads."
The Xeon system supplied by SGI "was a significant improvement over its competitors," which included an IBM Power6-based system that was the other competitor in a final runoff.
There's no shortage of demand for computer power. NASA currently plans to deploy its own petaflops system as early as 2009, and may need 10 Pflops by 2012.
"We're busy trying to meet NASA's requirements for computing to design spacecraft and look at ways to mitigate what humans are doing to the Earth, looking at global warming and ocean and earthquake simulations," Thigpen said. "It requires more and more computing to meet these needs."
"We have people who need 40,000 cores to do their global-climate simulation, and that's something we can't offer them right now," Thigpen said. The system must also support design simulations for the Constellation vehicles that will travel to the moon and perhaps Mars.
"They could use our whole system, but they are just one mission we need to support," he said.
Heterogeneous vs. homogeneous
What computer scientists learn from the IBM Roadrunner's use of heterogeneous processor cores may be more important than the fact the system breaks the petaflops barrier.
In addition to its AMD X86 cores, the Roadrunner has Cell processors with eight vector-processing cores and a PowerPC controller. The system looks like a standard message-passing supercomputer made up of X86 cores, but it also can offload application hot spots to the Cell for acceleration by invoking the parallel libraries IBM provides.
IBM is providing software tools and a scaled-down version of the Roadrunner for what it hopes could be dozens of other computer users who want to try a similar approach.
There is still a lively debate as to whether the heterogeneous approach is the best one. Grice of IBM is quick to admit the industry is still looking in hybrid systems for a standard programming model that would be the equivalent of the message-passing interface used widely in today's homogeneous supercomputers.
"We are trying to find the right way to structure the hardware to make the software easier to program," said Grice.
"There is a little bit of extra burden on the programmer to control via software memory flow and caching on the Cell," Grice said. "We need to continue to come up with good parallel libraries and algorithms to stack together into solutions."
IBM worked with Department of Energy specialists and academics for more than a year evaluating the hardware requirements for applications in high-end computing. Grice said the major issue is getting application developers to understand that their biggest bottleneck is in limited memory bandwidth, something hardware cannot yet address.
"Once you get people to think about building algorithms for systems that are memory-constrained, using heterogeneous cores is not a problem," said Grice.
Oak Ridge's Bland added, "I commend the Roadrunner team for experimenting with a hybrid system, and I think we will learn a lot from it. I think heterogeneous processing really is our future. We think some kind of acceleration function--whether with the Cell or GPUs or something else--will be absolutely necessary."
Several other systems are actively engaged in a race to break the petaflops barrier, but none are in a stage of actually testing performance for the June rankings. Dongarra said some of the machines might be in a position to get tested in time for the November rankings.
Japan had the world's most powerful supercomputer, the Earth Simulator, for five iterations of the Top 500 list starting in 2002. But in November 2004, IBM's 70-Tflops BlueGene/L system at Lawrence Berkeley Lab leapfrogged the 35-Tflops Earth Simulator. Since that time, the BlueGene system has remained the most powerful system in the world and is now rated at about 478 Tflops.
Japan has announced a follow-on project called the Life Simulator, targeted at achieving 10 Pflops of sustained performance. But it is not expected to be ready until 2011, Dongarra said.