San Mateo, Calif. - Cray Inc., IBM Corp. and Sun Microsystems Inc. are heading down separate paths in a race to define a petaflops-class computer. The three companies last week nabbed about $50 million each from a Defense Advanced Research Projects Agency program that aims to deliver working systems with breakthrough performance by 2011.
The money will fund a three-year R&D effort to turn the trio's very different and aggressive paper concepts into realistic implementation plans by 2006. Darpa will then decide on two to fund for building prototypes.
The resulting systems are aiming at 10 to 40 times the average performance of today's high-performance computing machines, but must also be easier to program. Darpa nixed concept proposals from Hewlett-Packard Co. and Silicon Graphics Inc., both of which made entries in a first phase of the so-called High Productivity Computing Systems project.
The three remaining contenders are taking diverging routes to their common goal. While each system embraces a broad set of new technologies, IBM's plan centers on cache memory, Sun's on interconnect and Cray's on a novel dual-processor architecture.
IBM will rely heavily on its embedded DRAM technology to build Power processor-based systems with "hundreds of megabytes or a gigabyte of [on-chip] cache," said Mootaz Elnozahy, program manager for the project and a manager of the systems software department at IBM Research.
The CPU will have a special mode allowing applications to treat that on-chip cache effectively as main system memory. In this scenario, traditional main memory acts almost like storage. "The processor will be a system or subsystem on its own with memory to compute independently. Most work will be handled on the processor," said Elnozahy, who manages the current team of 49 engineers on the project.
Memory access speed will be important, but processor speed will vary greatly depending on the application as a means of lowering power consumption, he said.
While the giant cache plays a central role in the design, IBM's concept embraces many new, and mainly still secret, ideas. "We are targeting the entire computing stack from circuit design to chip manufacturing and packaging, computer architecture for processors, and systems and compiler technology," Elnozahy said.
Sun's proposal, meanwhile, revolves around a novel communications fabric to provide high-bandwidth links between both processor cores on a die and CPUs across a system board, lessening the need for large caches and elaborate memory hierarchies. Sun may disclose details of the fabric this fall once its patent applications are in order. "It's one of the biggest secrets in Sun and something that could be a huge technology advantage," said John Gustafson, a Sun senior scientist who is managing the petaflops program.
A related and equally novel aspect of the Sun design is "a scaling technology that solves the same problem as does cache coherency," for a very flat, low-latency memory architecture, said Gustafson.
Sun has been more public about its plans to use the asynchronous technology that's been under development for several years in its labs. "We are trying to push in the direction of as asynchronous a machine as possible-we think that's where systems design will go," Gustafson said. "That lets us run everything as fast as possible while saving on power consumption."
The resulting architecture presents the user with a single system image even though it employs multiple processors, each with about the same level of parallelism as Sun's upcoming Niagara processor (see March 3, page 1). Niagara will sport eight cores, each executing up to four threads. Nevertheless, Gustafson said the underlying processor architecture is relatively unimportant in the overall design.
Not so for Cray, which will field a system with not one but two custom processors. A so-called heavyweight processor will tackle jobs where temporal data relationships are prominent, implementing streaming techniques pioneered by William Dally of Stanford. Separate "lightweight" processors will handle jobs where spatial data relationships dominate. Both types will leverage vector and multithreaded techniques found in Cray's current processors.
The Cray system will consist of tens of thousands of nodes, said Burton Smith, chief scientist at Cray. Each node will contain a heavyweight processor that integrates multiple memory controllers and a router linking the chip to multiple lightweight processors closely linked, in turn, to external DRAM chips. "We are exploiting both these [processor] ideas to save on system bandwidth and really cut the need for remote data accesses," said Burton.
Cray's node-routing scheme will use a novel topology, and a "fairly significant" amount of the routing will be embedded in the processor hardware, Burton said. Cray will devise a programming environment that automatically generates applications code, to hide from users the complexity of its dual-processor architecture and nonuniform memory access structure.
The tough task ahead for all three companies is finding realistic means to implement the risky concepts they have articulated. "Now we have to figure out what to build, how much it will cost and what it will look like," said Burton. "It's an all-new operating system and architecture," said Sun's Gustafson. "These are things you really need a Darpa grant to do."
"Regardless of the final outcome, this is a great opportunity for us," said Elnozahy of IBM.
Gurus of computer design are on hand to help. Besides Bill Dally, Cray has recruited for its team Thomas Sterling of Caltech, a longtime researcher in parallel systems. Sun has recruited Rambus co-founder Mark Horowitz of Stanford and RISC pioneer Dave Patterson of Berkeley.