PALO ALTO, Calif. Developers mapped out four routes to supercomputingincluding one based on experimental work in nanotechnologyin the opening sessions of the annual Hot Chips conference here Monday (Aug. 18).
In a keynote session, Tadashi Watanabe, vice president of high- performance computing at NEC Corp., detailed the company's Earth Simulator, currently the world's most powerful supercomputer. The custom-built system stirred controversy when it was announced in April 2002 as having a 40 teraflop peak performance using 5,120 vector processors, far beyond the capabilities of even the fastest systems in the U.S. Those systems rely on clusters of many more off-the-shelf scalar processors.
Watanabe said the system cost $400 million to build over a five-year project life and incurs about $15 million in annual operations costs. It consumes about eight megawatts power and is housed in a custom three-story 3,250 square meter building in Tokyo where one floor is dedicated to some 83,200 cables that link its 320 computer cabinets and 128 interconnect systems.
The system is measured at 87.5 percent efficiency giving 35 teraflops on the Linpack benchmark, but actually delivers from 14 to 26 teraflops or 38 to 66 percent efficiency in real-world applications, Watanabe reported. That's still far beyond efficiency ratings as low as 15 percent for many of today's supercomputers, he said.
The system has error rates that result in down time for one of its 640 nodes once a week.
By contrast, Cray Inc. reported on its plans to deliver to Sandia National Laboratories by the end of 2004 the Red Storm system based on a whopping 10,368 off-the-self AMD Opteron processors. The system could also hit a peak theoretical performance of 40 teraflops, but will consume just two megawatts of power, fit into a 3,000 square-foot building and take about two years to build.
"We really needed to get going fast so we used existing technology as much as possible, said Robert Alverson, hardware architect for the system.
Cray has crammed most of the system functionality into an ASIC made in a 130-nm IBM processes. The Seastar chip includes a seven-port router linking to the systems 3-D mesh, an 800 MHz DDR Hypertransport interconnect linking to the Opteron processors and a PowerPC core for handling message-passing chores.
The Seastar lets Cray cram four compute nodesconsisting of four Opertons, four ASICs and a management controlleron a single card. "This is a step forward in integration, and I think it is the way we will build systems in the future," he said.
To save time and money, Red Storm is not using symmetric multiprocessing features built into the Opteron. However, Alverson said he expects Cray to create systems based on four- or eight-way SMP nodes for future customers.
Quadrics Ltd. (Bristol, England) detailed it's next-generation message passing interconnect, the QsNet II. It will be deployed in a new Itanium 2-based system being deployed at Pacific Northwest National Laboratories.
QsNet II is based on an Elan 4 network interface ASIC that supports up to 8,000 command queues that can be used simultaneously by multiple processors. It provides an aggregate bidirectional link bandwidth of 2.6 Gbytes/second.
Quadrics pegs overall latency of the interconnect at about 1.7 microseconds based on simulations of a 4,000 node machine with 50 meters of cabling. Latency on short message passing traffic generally ranges from 3 to 5 microseconds depending on message length, typically several microseconds less than latency for Infiniband, according to Quadrics.
System I/O in the form of the PCI-X bus currently represents the biggest constraint for Quadrics which foresees next-generation links moving to PCI-X 2.0 and eventually PCI Express. "Sixty percent of the interconnect latency is between the CPU and the PCI-X slot," said Jon Beecroft, head of ASIC development at Quadrics.
Looking further into the future, Andre DeHon, an assistant professor of computer science at Cal Tech, said silicon nanowires could form the basis of future molecular-scale computers. DeHon described how researchers are attempting to build memories and programmable logic arrays with a 10-nm pitch using nanowires just six to eight atoms in diameter.
"We are trying to build interesting size memories out of these over the next few years. You could build interesting devices in this technology in three to five years if you really push it," DeHon said.
DeHon described a chemical vapor deposition process using gold as a catalyst for growing nanowires up to 20 microns in length. Those wires can be doped to create addressable regions and then etched into arrays to create programmable memories or logic devices.
"I am not going to claim this is faster than CMOS, but the ballpark I can say today is we can make these things now running in the 10-GHz range," DeHon said.