SAN JOSE, Calif. – Intel has officially entered the race to heterogeneous computer processors, sketching out the first members of its Sandy Bridge family that will ship before April. The 32nm chips will come in versions with two or four dual-threaded x86 cores and one graphics core on a shared ring interconnect.
The first Sandy Bridge parts are aimed at notebooks, desktops and single-socket servers. Versions with more cores aimed at multi-socket servers will follow later in the year or early in 2012.
The Intel CPUs will compete head on with parts from archrival Advanced Micro Devices such as Ontario, a 40nm CPU using two of AMD's new Bobcat cores and a Microsoft DirectX11-class graphics core. Ontario is sampling now, and AMD has similar desktop and server chips in the works for the second half of 2011.
AMD will have an edge in graphics. Engineers at the Intel Developer Forum here said their new graphics block will not support the DirectX 11 API. However, Intel outlined a host of enhancements in its processor that will help it compete in other areas.
"There are no exclusive DX11 games out today, and DX11 is around the corner for Intel based products," said Tom Piazza, an Intel fellow who led the graphics core design.
The rivals are not yet releasing the most significant details of the chips such as performance, data rates and cache sizes. However both companies' desktop chips are likely to be held to two DDR3 external memory channels, a limit set by PC makers, said Opher Kahn, a senior principal engineer on Sandy Bridge.
Intel's Opher Kahn worked on the Sandy Bridge ring interconnect
Among other external interconnects, Sandy Bridge supports PCI Express Gen2 and DisplayPort.
Engineers packed back-to-back afternoon sessions at IDF describing some of the low level details of Sandy Bridge. Chief among them is the use of a ring interconnect that could scale to link as many as 20 cores on a die, said Kahn.
Intel re-used much of the electrical design of previous rings on Intel's previous Westmere and Larrabee processors. However, they re-worked much of the higher-layer coherency protocols for Sandy Bridge.
The interconnect is made up of at least four rings, a 32-byte data link and separate rings for requests, acknowledgements and snooping. The rings are overlaid on the design of the so-called last-level cache.
The cache is broken up into separate units, one per x86 core. Each cache block is responsible for its own coherency in a distributed structure that does not require a central arbiter.
The ring delivers about 96 Gbytes/second per connection at a 3 GHz data rate, as much as four times the on-chip bandwidth available to Intel's previous processor cores. It takes one clock cycle for data to progress one step on the ring. Traversing the full ring could take 26 to 31 clocks, Kahn estimated.
Intel is far from unique in its use of rings. The latest eight-core network processors from NetLogic Microsystems also use a ring interconnect.
Sandy Bridge links last-level cache blocks and cores on a ring