PALO ALTO, Calif. – IBM has become the first company to ship a commercial microprocessor using transactional memory, a new feature for multicore chips researchers have studied for years.
The BlueGene/Q processor used in the Sequoia supercomputer IBM is building for Lawrence Livermore National Labs will employ the new feature, IBM disclosed in a paper at the Hot Chips event here. Sequoia is expected to deliver 20 petaflops when it is complete in 2012.
When finished, the super could become one of the most powerful systems in the world. An early version of the system is already ranked as one of the most energy efficient supercomputers.
Transactional memory is a way of organizing related tasks into one big job for more efficient processing. It replaces the current practice of locking data until a complex job is done, an approach that can slow down other computer operations.
The former Sun Microsystems, now part of Oracle, implemented transaction memory in its Rock microprocessor aimed at large database computers, However, the Rock chips never shipped because the project was cancelled about the time Sun was acquired by Oracle.
A former Rock engineer said Sun tested the technique and found it had great advantages for some applications, but offered minimal help for others. It required almost no special hardware, he said.
Researchers at Intel and Microsoft have studied transactional memory for several years. An former Silicon Graphics engineer said he considered the technique back when the company was designing nits own processors.
IBM only implemented transactional memory within the confines of a single chip using a tagging scheme on the chip's level-two cache memory. The tags are used to detect any load/store conflicts in data to be used in a so-called atomic transaction scheduled by the computer.
If no conflicts are found, the job can be processed. If conflicts do appear, the chip asks system software to resolve them.
Thanks to its use of fast on-chip memory, the IBM approach lowers latency when compared to traditional locking schemes even under conditions where there is high data contention, said Ruud Haring, a senior IBM engineer who worked on the chip and presented a paper describing it.
IBM used its embedded DRAM process to build the chip's 32 MByte L2 cache. The memory banks use "a lot of neat trickery" to create a multi-versioned cache, Haring said.
Engineers are optimistic their work will show real benefits, but they are still tuning the supercomputer's compilers, so they lack performance data. "It feels good," Haring said.
Observers said the IBM work was sound but could not be widely used by other designers. A more useful approach would be to implement transactional memory among a broad group of processors linked in a complex cache-coherent scheme, they said.
Programmers using the IBM supercomputer are some of the most sophisticated software coders in the world, and they use a very limited set of applications. As such, they are good candidates to test out transactional memory, engineers at the conference said.
The IBM chip uses 18 cores, one just to process operating system tasks and another held in reserve as a spare. The cores are a custom circuit design based on the PowerEN core used in an IBM communications chip.
The rest of the BlueGene/Q processor was designed in an ASIC process given expectations for relatively low volume sales. The supercomputer may use as many as 100,000 of the chips. Running at 1.6 GHz, they deliver 204 Gflops at 55W, use 1.47 billion transistors and measure 19 x 19 mm.
The BlueGene/Q uses 18 PowerEN-based cores.