San Mateo, Calif. - An interagency task force is urging the U.S. government to more than double its spending on supercomputers, asking White House policymakers for "hundreds of millions" in additional annual spending phased in over five years.
The plan from the High End Computing Revitalization Task Force, now in a draft form with the White House Office of Science and Technology Policy, is one of three government reports on supercomputing coming out this month. The National Academy of Sciences issued its interim report last week, and the so-called Jasons group of academic experts is finishing a report of its own.
Together, the trio of documents represents a growing consensus on the need to boost long-term R&D to define new supercomputing architectures at the petaflops level and beyond, paving the way for a host of strategic government and scientific applications.
Researchers also are calling for greater investment in new software-programming models as part of an emerging shift in how performance in high-end computers will be measured. In addition, a consensus is emerging that multiple supercomputing architectures, both commodity and custom, will coexist in the future.
If the White House and Congress approve the task force's recommendations, backers hope the leap in funding will ignite commercial startups and attract more students into high-end scientific computing. "I hope there will be enough new spending to make it viable to launch some small companies that could make it big," said Alan Laub, a Department of Energy program director and a professor at the University of California, Davis, who co-chairs the task force. "We want to make sure the pipeline is full for scientists and engineers in this field."
The government spends an estimated $700 million a year in supercomputer research and equipment purchases as part of its overall $45 billion in science and technology budget, Laub said. "It's reasonable to request at least a doubling of the budget. Whether we get it remains to be seen."
"There's a growing sense that we've underinvested [in supercomputing] for a few years now. We've been coasting a long time on the government's early work," said Roy Schwitters, a professor of physics at the University of Texas who participated in drafting the recommendations of the Jasons group of academic advisers to the government. Their report is slated for release this month.
Despite the federal budget deficit, the cry for more investment is growing louder in government circles. "There is strong support in Congress for investment in high-end computing. They see that we need a long-term R&D program to redress the shortfall," said Dan Reed, director of the National Center for Supercomputing Applications (NCSA; Champaign, Ill.), who helped feed industry input into the task force's plan.
One concern is that the report is "a bit late for the 2005 fiscal-year budget. Most of that is in bed," said Laub, the task force co-chair. That could mean the plan, which spans 2005 to 2009, will depend on support from the next administration.COTS vs. custom
The task force's plan does not directly address a debate over whether supercomputer systems should be built with commercial off-the-shelf (COTS) products or with custom technologies such as specialized vector processors. The government's Accelerated Strategic Computing Initiative (ASCI), which buys many of the most powerful systems used in the United States, has long favored supercomputing clusters generally built with COTS scalar microprocessors.
However, the task force plan implies that most of the recommended new funding should go to new custom technologies. "If you read between the lines, we are calling for more custom design. We call it architectural diversity," Laub said.
Researchers in and out of the ASCI program have expressed concern that many government and scientific applications will be difficult to scale to the emerging class of clustered systems, which will use tens of thousands of processors to hit performance beyond 100 teraflops. However, many researchers said commodity clusters have served ASCI and other users well to date, and will continue to play a role.
Striking the right balance between commodity and custom architectures and providing continuity in federal funding are both "essential to the well-being of supercomputing in the United States," according to an interim report on the future of supercomputing released by the National Academy of Sciences (NAS) last week. The group will deliver a full report recommending long-term actions late next year (see www.eetimes.com/story/ OEG20030812S0011).
"The message is, there is no single solution. There is a diversity of applications," said Susan L. Graham, computer science professor at the University of California, Berkeley, and co-chair of the NAS report.
Some researchers see the COTS-vs.-custom debate as a red herring, since many systems use mixtures of the two approaches. More critical, they said, is solving problems such as how to help programmers write applications more quickly for complex high-end systems.
"The cycle time from when engineers have an idea to when they have a program ready to run is one of the bottlenecks in high-end computers, and it will only become worse as we develop bigger and bigger machines," said Robert Graybill, who manages the High Productivity Computer Systems (HPCS) program at the Defense Advanced Research Projects Agency (see www.eetimes.com/story/OEG20030714S0007).
The Darpa program is working on new ways of measuring this productivity that could supplant today's teraflops and petaflops metrics to measure high-end systems. "You get exactly what you measure, and we have been measuring the wrong things," said Graybill.
The Jasons report, meanwhile, will add another twist, urging that the ASCI program shift its annual $75 million to $100 million systems budget toward "capacity" machines-high-end systems that could run multiple applications for multiple researchers-rather than the single-user variety, termed "capability" machines, used today. ASCI is now working on a plan to reach a 50/50 split of its budget by 2008, said Mark Seager, an assistant department head for advanced technologies at Lawrence Livermore Laboratories who works on the ASCI program.
Other long-term challenges include how to handle growing latency that stems from memory devices not keeping pace with processor and interconnect speeds. The industry also must lower the cost of interconnect switches, which "in some cases exceeds the cost of the [computing] nodes [they connect]," said NCSA's Reed. "All [supercomputing] technologies need more investment, including vector computing."
Long-term approach sought
In testimony before the House Science Committee last month, Reed said the U.S. problem is twofold. First, he said, the government needs to buy more supercomputers to handle a backlog of current scientific research across a broad spectrum of applications. Second, it needs a long-term R&D program to address applications for which there is no solution today, Reed said.
Co-chair Laub said the task force recommended greater government interagency cooperation in supercomputing, but didn't define a long-term R&D program as such. However, he said he expects the government will create such a program as a result of the budgeting plan.
Vendors, briefed on the task force plan in July, generally welcomed the idea of a boost in government spending, Laub said. Industry insiders predict the uptick could favor companies such as Cray Inc. (Seattle), with a history of developing custom high-end systems. However, said a Cray spokesman, "All boats are likely to be lifted if spending increases to this extent."
"We are encouraged to see the recognition of the need for increased funding, particularly for diverse architectures, to correct the imbalance of the last several years that have focused on off-the-shelf technologies," said James E. Rottsolk, president and chief executive officer of Cray.
"This will motivate us to dig into scientific computing even more fervently than we already have," said John Gustafson, a senior scientist at Sun Microsystems Inc. working on the Darpa supercomputer program.
At LinuxWorld earlier this month, IBM Corp.'s Dave Turek said the company launched a Deep Computing unit earlier this year to give IBM a sharper focus on high-performance computing. The group is involved in five government supercomputing projects, including Darpa's, and is targeting high-end needs in new markets such as digital media, said Turek, who is the unit's vice president.
In Japan's shadow
Much of the U.S. impetus for more government investment stems from the announcement last year in Japan of the Earth Simulator, a custom-built NEC Corp. supercomputer that delivers 35.8 Tflops at an estimated cost of as much as $500 million. At the time of its launch, in April 2002, the NEC system had about five times the performance of the fastest U.S. supercomputer, a Lawrence Livermore National Laboratories machine, when Japan launched its system in April 2002.. Today a Los Alamos National Lab system hits about 13.8 Tflops, still well behind NEC's watermark but enough horsepower to rank as the second most powerful system in the world.
"The Earth Simulator created a tremendous amount of interest in high-performance computing and was a sign the U.S. may have been slipping behind what others were doing," said Jack Dongarra, a professor of computer science at the University of Tennessee who is working on the National Academy of Sciences report.
The Earth Simulator, which was unique at the time of its launch, uses custom vector processors with custom memory and processor interconnects, providing high memory bandwidth for its 5,150 processors.
By contrast, the Los Alamos computer employs 8,000 off-the-shelf processors to get about a third of the performance of the NEC system, in part because it uses a greater number of commercial interconnects from Quadrics Ltd. (Bristol, England).
"The biggest lesson of the Earth Simulator is they [the Japanese] made a decade-long plan and stuck to it," said Reed of the NCSA.
NEC's achievement notwithstanding, 91 percent of the top 500 supercomputers in the world today were made in the United States, according to the NAS report.
Another investment driver lies in the fact that the road to a petaflops system and beyond is still unclear at a time when strategic applications in cryptoanalysis and many scientific fields are begging for horsepower. "At some point you have to come up with a new architecture," said Seager of Livermore Labs.
Seager said that ASCI needs a petaflops system by about 2010 to handle applications tied to nuclear-weapons simulations. It's not yet clear whether IBM's BlueGene/Light program or the Darpa HPCS program would be able to meet that goal, he added.
Laub, the task force co-chair, noted that the HPCS program ends in 2009. "There is nothing to replace it, and that could be a catastrophe," he said.