SAN JOSE, Calif. Just as clusters of inexpensive systems are hitting new milestones in supercomputing, a group of 20 top researchers has kicked off a program to redefine the benchmarks used to measure high-performance systems. Their work could have a profound impact on how the most powerful computers are designed and purchased.
The High Productivity Computing Systems program under the Defense Advanced Research Projects Agency quietly launched in August a three-year effort to deliver by 2006 benchmarks that measure multiple hardware and software aspects of a computer's overall capabilities.
The HPCS' first step will be this coming week, when it launches the so-called HPCchallenge benchmark of five hardware performance metrics. The benchmark, designed to broaden the Linpack benchmark of raw floating-point operations/second (flops) widely used today to rank the world's top supercomputers will roll out at the SC2003 supercomputing conference in Phoenix.
"We want a more nuanced answer to the question 'What computer is best?' The goal is to move toward more systems-level design that has an appreciation for software development and execution time, and lets designers try novel things without breaking the benchmarks," said one senior researcher close to the project who asked not to be named.
The HPCS benchmarking effort comes at a time when high-performance computing is on an upswing. Many users and designers are rallying around advances in increasingly inexpensive yet powerful clusters that could expand the sector, even as others call for more federal spending on new architectures to keep pace with performance leaps in Japan.
One still-secret government report advocates more than a doubling of U.S. spending on supercomputing. The High-End Computing Revitalization Task Force's report also recommends that the United States acquire a multihundred-teraflops system as a cornerstone of a new national supercomputing center, EE Times has learned. However, with record federal budget deficits, many are skeptical that a major boost in federal spending on high-end systems will be forthcoming.
The report, now being reviewed by government agencies, seeks to increase annual federal spending on supercomputing from tens of millions to hundreds of millions of dollars, and to ignite more work on diverse architectures beyond the clusters of commodity systems popular today.
"Everybody wants something big, to be more competitive with the [NEC Corp.] Earth Simulator. They can do climatic simulations with finer geographic and temperature resolutions than we can," said Alan Laub, an Energy Department program director and a professor at the University of California, Davis, who co-chairs the task force. "But we are being told not to look for a lot of money. I think the outlook is difficult. It's hard times right now."
Indeed, "in this budget climate, getting any new money out of the system is pretty difficult," said Mark Seager, an assistant director for advanced technologies at Lawrence Livermore National Laboratory (Livermore, Calif.).
Meanwhile, low-cost clusters are all the fashion. Virginia Tech (Blacksburg, Va.) captured the imagination of computer users and designers this fall when it assembled in a month a 10.28-Tflops cluster of 1,105 dual-core IBM PowerPC 970-based Apple G5 systems for just $5.2 million. The system is expected to rank as the world's third fastest supercomputer when a new Top 500 supercomputer list is released at the SC2003 conference this week.
"The Virginia Tech system is the newest on the list and the first time Apple has made it on the list that I recall," said Jack Dongarra, professor of computer science at the University of Tennessee, who maintains the Top 500 list. "The trend continues to be that we see more and more clusters on the list, and a higher fraction of systems based on commodity processors and switches."
The Virginia Tech system beat the world's current fastest cluster, a 7.6-Tflops system at Lawrence Livermore that uses 2,304 Intel Xeon processors at an estimated cost of more than $10 million. NEC's Earth Simulator tops the list at about 35 Tflops, but the custom design cost an estimated $400 million to build.
The Virginia Tech system was reported by media outlets ranging from the BBC to The New York Times, and received user requests from the Pentagon, National Security Agency, NASA and several national labs when it was announced in late September. "We're getting inquiries from all over," said Jason Lockhart, associate director of Virginia Tech's new Terascale Computing Facility.
Lockhart's team chose Infiniband as the clustering interconnect to link the 1,105 systems. While Infiniband is similar in price and latency to proprietary interconnects from companies like Quadrics Ltd. (Bristol, England), it offers 10-Gbit/second links, compared with 2 Gbits/s for Quadrics.
Infiniband chip startup Mellanox Technologies Inc. (Santa Clara, Calif.), which is supplying switches for the Virginia Tech cluster, said the effort marks the beginning of off-the-shelf teraflops systems. The company is working with Intel Corp. to set up a teraflops cluster of 192 systems on the SC2003 show floor. "Now you can do a teraflops system as a booth demo," quipped Dana Krelle, vice president of marketing for Mellanox Technologies (Santa Clara).
Separately, Mellanox last week announced a next-generation Infiniband switch chip capable of lowering costs of the interconnect from about $700 per port to about $400.
The overall rise of clusters, most using Intel-based systems and Infiniband, "points to a healthy dynamic in the X86 ecosystem moving into the high end. It's commodity microprocessors going right on up to supercomputers," said Seager of Lawrence Livermore.
Nevertheless, support is growing for more diverse supercomputing architectures to set new milestones in systems performance.
"We have seen in the last 18 months the re-emergence of the question of which way we should go in building supercomputers. I don't think there's a clear answer," said Jim McGraw, deputy director of the Institute for Scientific Computing Research at Lawrence Livermore and chairman of SC2003. "Different machines give different cost/benefits to different apps."
"There are certain problems that do not run well on Linux clusters and that will always be the case," said the HPCS source. Specifically, clusters fall down in applications with unpredictable global-memory address patterns, such as large graphing problems in bio-informatics and some simulations with extremely irregular meshes, the source added.
Backers hope the new HPCS benchmarks may show where clusters do and don't fit. They could also help push down costs and make high-performance systems easier to use, opening the door to new users. That's in part because the group hopes to detail currently unknown software development costs, something it will reflect in future benchmarks.
The HPCS source said clustering programs that use the popular message-passing interface can generate 30 to 50 percent more code, raising software development costs 50 to 100 percent over traditional serial- programming techniques. Thus, one goal is not to move to ever larger, cheaper clusters but to build new architectures that could marry powerful parallel hardware with simple serial-programming tools.
"One unproven assumption here is that it could make sense to spend more on the hardware because the software is so expensive. There are many programs where you spend hundreds of millions of dollars on the software," said the HPCS source.
Thus, not only has the HPCS program pitted Cray, IBM and Sun in a race to build the first petaflops computer with novel hardware by 2010, it is also attempting to change the rules for how productivity on systems of that generation will be measured.
"Just as Linpack has had an impact on the way machines have been designed, we hope HPCchallenge and its follow-ons will have a similar impact on architectural decisions," said Robert Graybill, who oversees the HPCS program at Darpa. "We are not only trying to build a petascale computer, but one that's more useful." The HPCchallenge benchmark uses five metrics to show a system's strengths and weaknesses.
Dongarra at the University of Tennessee has been working with one other researcher to craft the HPCchallenge. "We've been racing to complete this by the conference. The goal is to have a test you can download, run and post results. It's part of a broader effort to improve productivity," he said.