AUSTIN, Texas A team of processor designers from IBM Corp.'s server group will detail the Power4 CPU at the Hot Chips conference later this month. The processor will be used in both the AS400 and RS/6000 server families, which are slated to hit the market in 2001.
The Power4 is the first IBM processor design to include two processors and an L2 cache on the same die, taking advantage of the transistor densities possible with 0.18-micron design rules.
Carl Anderson, a distinguished engineer at IBM, said one Hot Chip paper from Brad McCredie and Roger Bailey, the processor design team leaders, will describe the Power4 test chip. A second paper will detail an asynchronous interface approach, spearheaded by Frank Ferraiolo.
The CPU design is the first by the design team here that will be able to run either the AS400 OS or the AIX version of Unix used in IBM's RS/6000 workstations and servers, including an RS/6000 SP system sold into the supercomputer market. Earlier Power3 designs from IBM's team in Rochester, Minn., were able to run either OS.
High bandwidth
"With a 0.18-micron process and copper interconnects, there are a much larger number of transistors available to the designers. One way to put them to use is to put two processor cores on the same die, and take advantage of the very high bandwidth possible between them," Anderson said.
Though IBM's Rochester design team has created an AS400 multithread processor before, the Power4 design under way here is a standard out-of-order processor. Like the Power3, the Power4 has two floating-point units per processor core, or four FPUs per die. It has multiple load and store units, and many of the other architectural features of the Power3, to support a high bandwidth interface to main memory and back. It operates with a 1.5-V power supply. IBM will discuss the microarchitecture in detail at the Microprocessor Forum, which begins Oct. 4 in San Jose, Calif.
The Power4 can be used to create 32-way systems. (With two processor cores on each die, a 32-way system would use 16 Power4s.) Though Anderson declined to elaborate on the bandwidth or bus specifications, he said the bus would run at greater than 500 MHz.
"The goal is to run the bus at half the processor speed, which is targeted at greater than a gigahertz for the processor. The data and clock are all sent on the data bus, and that approach is best for very high bandwidth systems. We devoted a fair amount of circuits to extract the clock and deskew the data," Anderson said. The bus on the current Power3 CPU runs at 250 MHz.
The I/O design relies on synchronous transfer of the clock and data, an approach that will take IBM from greater than 500-MHz I/Os in the current design, to beyond the gigahertz level in the next generation.
Minimizing latencies is particularly important in multiprocessor systems, and in a synchronous approach the latency of the interface can vary over a wide range in some cases multiple bus cycles while still keeping synchronous operation intact. A synchronous approach has the advantage of reducing the timing variations, without using a more costly process or strict design constraints.
For the Power4, the I/O design supports point-to-point, unidirectional and bidirectional bus types. It is an all-digital design, with low power, source-terminated drivers and active clamps on the receiving circuits, where a FIFO is placed.
To keep more data valid, the I/O design supports a wide range of arrival times.