Each processing element comprises a Power-architecture 64-bit RISC CPU, a highly sophisticated direct-memory access controller and up to eight identical streaming processors. The Power CPU, DMA engine and streaming processors all reside on a very fast local bus. And each processing element is connected to its neighbors in the cell by high-speed "highways." Designed by Rambus Inc. with a team from Stanford University, these highways or parallel bundles of serial I/O links operate at 6.4 GHz per link. One of the ISSCC papers describes the link characteristics, as well as the difficulties of developing high-speed analog transceiver circuits in SOI technology.
The streaming processors, described in another paper, are self-contained SIMD units that operate autonomously once they are launched.
They include a 128-kbyte local pipe-lined SRAM that goes between the stream processor and the local bus, a bank of one hundred twenty-eight 128-bit registers and a bank of four floating-point and four integer execution units, which appear to operate in single-instruction, multiple-data mode from one instruction stream. Software controls data and instruction flow through the processor.
Another ISSCC paper describes a dynamic Booth double-precision multiplier designed in 90-nm SOI technology.
The processing element's DMA controller is so designed, it appears, that any chip in a system can access any bank of DRAM in the cell through a band-switching arrangement. This would make all the processing resources appear to be a single pool under control of the system software.
Giving scale to the performance targets for the project, one of the ISSCC papers puts the performance of the streaming-processor SRAM at 4.8 GHz. This suggests the data transfer rate for 128-bit words across the local bus within the processing element. When the Cell alliance was announced in 2001, Sony Computer Entertainment CEO Ken Kutagari estimated the performance of each Cell processor a collection of apparently four processing elements in the first implementation at 1 teraflops.
But UNC's Zimmons has his doubts. "I believe that while theoretically having a large number of transistors enables teraflops-class performance, the PS3 [Playstation 3] will not be able to deliver this kind of power to the consumer," he wrote in response to an e-mail query from EE Times. "The PS3 memory is rumored to be able to transfer around 100 Gbytes/second, which would mean it could process new data at roughly 25 Gflops (at 32 bits) far from the 1-Tflops number."
Sony's 300-mm fab at Nagasaki, Japan, will run the 65-nm process and IBM Corp.'s fab in East Fishkill, N.Y., the SOI line.