Cray tips its hand in M petaflops bid

 

San Jose, Calif. -- Cray Inc. has tipped details of its road map through 2010 as it prepares a strategic bid for a $200+ million government contract to build a petaflops computer. The troubled supercomputer icon's plans shine a light on the future of high-end systems and suggest that Advanced Micro Devices Inc. may keep ahead of Intel Corp. in server CPU design for the rest of the decade.

Cray described Cascade, a cluster-in-a-box planned for 2010 that will deliver a mix of scalar, FPGA and hybrid vector/massively multithreaded processors. The Cascade plan will vie with proposals from IBM Corp. and Sun Microsystems Inc. to win funding from the Defense Advanced Research Projects Agency (Darpa).

The U.S. companies are locked in a global race with Japan's NEC and three Chinese vendors to deliver the first computer to break the petaflops barrier. The Chinese competitors are Lenovo, which last year purchased the PC division of IBM; Galactic Computing (Shenzhen), a startup led by U.S. supercomputer designer Steve Chen; and Dawning Information Industry Co Ltd. (Beijing).

Proposals are due within weeks from Cray, IBM and Sun for Darpa's High Productivity Computing Systems project. HPCS aims to accelerate development of multipetaflops systems that would be radically easier to program than today's supercomputers.

"I don't know if this is a make-or-break deal for Cray, but it certainly will be critical to their long-term viability," said supercomputer expert Jack Dongarra, a distinguished professor in the Computer Science Department at the University of Tennessee.

High-performance technical computing "is our only focus," said Steve Scott, Cray's chief technology officer. "We are very serious about it. So the HPCS Phase 3 funding is a big deal for our company. It allows us to 'think out of the box' about systems a few years ahead of what we are used to."

AMD's lead
Last summer, Cray undertook a months-long evaluation of whether it would continue with AMD or switch to Intel as its strategic processor supplier through 2010. After reviewing both companies' road maps, Cray judged AMD's technically superior and thus decided to stay put.

"We were serious about switching to Intel if that made more sense, [but] we really like what AMD is doing," said Scott. "We are very happy with the AMD processor cores and systems interfaces. They have been leading Intel for a few years, and we see that likely to continue."

That word comes just as Intel has disclosed its Core microarchitecture in a bid to close the gap with AMD on performance and power (see story, page 1).

From Cray's perspective, one of AMD's crown jewels is HyperTransport, the gigahertz coherent interconnect on AMD's CPUs. Last fall, just a week after Cray made its decision to stick with AMD, Intel announced that introduction of its CSI processor interconnect would be delayed a year (search www.eetimes.com for article ID: 60404677). Though Intel has been sketchy on the details, CSI is believed to refer to Coherent Scalable Interconnect, a HyperTransport-like point-to-point inter- connect for directly linking processors that contain embedded memory controllers. Intel had suggested it would use the technology on both its Itanium and Xeon processors in 2007.

Intel still plans to use CSI on its Itanium CPUs, probably next year, said a senior computer engineer who asked not to be identified.

Meanwhile, the HyperTransport Consortium is weeks away from launching HT 3.0, which is expected to at least double the bandwidth of that interconnect while lowering latency and leaving the underlying protocol largely unaltered. "The spec is in progress, and we are absolutely on schedule," for a mid-2006 release, said consortium chairman David Rich.

Other factors beyond HyperTransport figured in AMD's favor when Cray made its decision, Scott said. They included AMD's use of an integrated memory controller, which reduces memory latency to about 53 nanoseconds, and its elimination of the north bridge chip, saving board space, power and cooling.

But Intel CTO Justin Rattner counters that embedding a memory controller creates a more power-hungry CPU. "We come down on the side of keeping the memory controller in the chip set," he said.

In its new architecture, Intel has ratcheted up its front-side processor bus to a surprising 1.33 GHz. "Intel did a better job than I thought with Core," said Nathan Brookwood, principal of market watcher Insight64 (Saratoga, Calif.). "AMD's performance advantage will narrow as the year progresses. They may even lose [it]."

A view of Cascade
A future version of Cray's XT3 "Red Storm" system, based on AMD Opteron CPUs, is the envisioned foundation on which Cray would consolidate technologies from three other systems it sells today: the X1E vector processor, the MTA multithreaded system and the XD1 system, which uses FPGA accelerators.

"Our idea is to take these diverse architectures and integrate them. We plan a couple of iterations of systems that culminate in Cascade," said Scott.

"It looks like Cray is more focused than in the past" said Dongarra, who has been briefed on Cray's plans.

Cascade will take blade computing to a new level by integrating three types of processor boards. Opteron/Linux boards will handle overall systems services and act as applications processors.

A new board will be based on a hybrid ASIC that can shift on the fly between modes for vector processing and massive multithreading. The ASIC will functionally combine the 128-thread processor acquired from Tera Computer in March 2000 with a new version of Cray's existing vector processor.

In addition, Cray expects to design an FPGA accelerator board for Cascade based on its XD1 system. Thus Cascade will be "a high-performance 'data center in a box' that you can optimize for any kind of application--scalar, vector, multithreaded or FPGA," Scott said.

Cascade will house all the boards in a system derived from its Opteron-based XT3 but will feature twice the processor density. The system provides a common interconnect based on the HyperTransport 3.0 with globally addressable memory accessible by any blade. Cascade also has a new approach for cooling the air that passes between chassis inside a cabinet.

Cray is essentially borrowing the blade concept from today's commodity servers and tying it to the concept of a hybrid processor, an approach also under consideration by the NEC-led Japanese team. "This style of computing makes a lot of sense for the high end," said Dongarra.

The toughest innovation for Cascade is in developing compiler software that can handle a mix of applications calling for scalar, vector or massively multithreaded applications, all with minimal guidance from the programmer. "The idea is the compiler will do everything automatically, but you can put hints in your code to help it," said Scott. "For instance, a program might indicate an upcoming block of code has no data dependencies and so is vectorizable, or it may make a call to a library of routines for FPGA accelerators."

That capability is a key focus of Darpa's HPCS program, because big supercomputing centers typically spend more money on writing custom programs tailored for the intricacies of a specific supercomputer than they do on buying the supercomputers themselves. "It's very hard to write these large, scalable high-performance apps, and they are getting more complex as we go," said Scott.

Cascade will also support Universal Parallel C and Co-Array Fortran, new versions of C and Fortran tailored for massively parallel computing. The languages help express the locality of hardware resources in massively parallel systems to ease the job of creating global data structures across widely distributed memory resources.

"Today people program supercomputers in a style that's been around since the 1970s, using what is basically an assembly language. It's hard enough to write parallel programs, let alone use tools from the '70s to do it, so it's good to see new interest in these languages," said Dongarra.

Both Sun and IBM declined to be interviewed for this article. Previously, Sun had disclosed its HPCS proposal would be based on a novel interconnect, called Proximity, that uses capacitive coupling to create high-performance blocks of silicon processors in a checkerboard arrangement.

Cray badly needs to win the $200 million funding for the next phase of the HPCS program. The company had a net loss of more than $200 million in 2004, and lower-than-expected revenue and margins contributed to a $55 million loss in the first nine months of its 2005 fiscal year. Cray laid off 10 percent of its employees--about 90 people--in June, and for a time had a salary-reduction program in place that affected many of its employees until the program ended late last year.

Cray has been late delivering the Opteron-based Red Storm system that will be the foundation for Cascade. In addition, the MTA and XD1 products obtained through the respective acquisitions of Tera and OctigaBay Systems have not been as successful as Cray had hoped. "The MTA was expensive per CPU, and not many systems shipped, due to the expense and the system's novel OS," said Scott.

As for the XD1, Cray has identified only a few applications where programmers are willing to do the heavy lifting required to write software for its FPGA accelerators. Cray plans no follow-on to the XD1, though it may design FPGA boards for a future Opteron-based system or for Cascade.