Cray, IBM picked for U.S. petaflop computer effort

 
SAN JOSE, Calif. — Cray Inc. and IBM Corp. will split nearly half a billion dollars as part of a government contract announced Tuesday (Nov. 21) to fund development of petaflop-class supercomputers before the end of 2010. A third competitor, Sun Microsystems, was dropped from the program that aims to foster work on computers that are more powerful and easier to program than any in current operation.

The two companies were selected for phase III of the High Productivity Computing Systems program (HPCS) managed by the Defense Advanced Research Projects Agency (Darpa). Cray will receive $250 million and IBM $244 million to develop prototype systems by 2010.

The prototypes will have to show a path to computing one quadrillion floating point operations per second and improve application development time ten-fold compared to 2002 when the HPCS program began. The two winners will also have to show Darpa officials a business plan for how they will develop systems based on the prototypes for use both by government and commercial users.

"High productivity computing contributes substantially to the design and development of advanced vehicles and weapons, planning and execution of operational military scenarios, the intelligence problems of crypto-analysis and image processing, the maintenance of our nuclear stockpile, and is a key enabler for science and discovery in security-related fields," said William Harrod, manager of the Darpa program in a statement.

It's unclear what impact the loss may have for Sun which has struggled since 2000 to be profitable and competed aggressively for the contract. Specifically, Sun proposed a novel capacitive coupling chip-to-chip interconnect called Proximity as well as a high-end parallel programming language called Fortress which it tried to make into an ad hoc standard.

IBM has been less forthcoming about the details of its proposal. However, the company did disclose it is based on a Power7 microprocessor, its AIX operating system and General Parallel File System.

Cray has been the most candid of the three about the Cascade system it proposed to Darpa planners. Cascade is essentially a cluster-in-a-box that will deliver a mix of scalar, FPGA and hybrid vector/massively multi-threaded processor boards in a single system.

The system is essentially a hybrid design based on a future version of Cray's XT3 "Red Storm" system which uses AMD Opteron CPUs. It also incorporates technologies from three other systems Cray sells today: the X1E vector processor, the MTA multithreaded system and the XD1 system that uses FPGA accelerators. Cascade will use Opteron/Linux boards to handle overall systems services and act as applications processors. A new board will be based on a hybrid ASIC that can shift on-the-fly between modes for vector processing and massively multithreading. In addition, Cray anticipates designing an FPGA accelerator board for Cascade based on its XD1 system.

The toughest innovation for Cascade is in developing compiler software that can handle a mix of applications calling for scalar, vector or massively multithreaded applications with minimal guidance from the programmer.

According to a Darpa statement, Cascade software includes novel debugging and performance tuning tools, the Chapel high productivity language, and an operating system designed to scale reliably and efficiently to hundreds of thousands of processors.

Cray badly needed to win the HPCS program. The company had a net loss of more than $200 million in 2004, and a $55 million loss in the first nine months of its 2005 due to lower than expected revenues and margins. Although Cray made a modest profit of $63 million in 2003, more than two-thirds of it came from a tax benefit, and 2002 saw profits of just $5 million.

At the end of June 2005, Cray laid off ten percent of its employees, about 90 people. Many employees had been under a salary-reduction program until late last year.

"We are very serious about it. So the HPCS Phase 3 funding is a big deal for our company. It allows us to think out-of-the-box about systems a few years ahead of what we are used to," said Steve Scott, chief technology officer for Cray in an interview in September.