SAN JOSE, Calif. In a handful of technical papers presented the week of June 11, Intel Corp. is disclosing more details about its research into the future of multi-core processors.
Highlights include work on a new ultra low power chip-to-chip interconnect that hits data rates up to 15 Gbits/second as well as techniques to spread jobs more effectively across multiple cores and their memories. Intel would not say when any of the advances might appear in its commercial products.
"Eventually we will need to get to deliver terabits/s of data to a multi-core die," said Randy Mooney, an Intel Fellow and director of I/O research. But using today's techniques that could require a whopping 100W of power, he said.
In a paper at the VLSI Symposium in Japan, Intel engineers described a chip-to-chip link
That scales from 5 to 15 Gbits/s at power consumption levels as low as 2.7 mW/Gbit/s. The technology, used in Intel's prototype 80-core Terascale processor, consumes from 14mW at 5 Gbits/s to 75mW at 15Gbps.
Rambus announced earlier this year an I/O technology that hits 2.2 mW/Gbit/s, however it is only characterized to work at 6 Gbits/s. "We are generating fast I/Os with as little as 14 percent of the power used in production interfaces today," said Mooney.
Intel achieved its power levels using multiple techniques. The design dynamically scales the frequency and voltage levels of both transmitter and receiver chips. In addition, Intel used a passive inductor to terminate data lines rather than a resistor.
The chip also saves power by eliminating clock buffers, letting clock information vary as it comes in via a wire with controlled electrical characteristics. The clock variance will not disrupt processing jobs, but the elimination of buffers reduces power significantly, Mooney said.
In a separate paper, Intel disclosed a software implementation of transactional memory that lets a processor eliminate much of the coarse-grained locking mechanisms that stall jobs waiting for memory to become free. By using fine-grained so-called atomic transactions, Intel was able to show performance improvements that ranged from five percent to 100 percent depending on the application.
"There have been a lot of implementations of atomic transactions, but this one can scale well to chips with tens of cores," said Jerry Bautista, co-director of Intel's terascale computing research effort.
Among other advances in the terascale project, Intel disclosed it is using a hardware scheduling unit on the die to better manage the process of assigning threads to cores. The hardware scheduler doubled performance on a simulated 64-core CPU.