OTTAWA Monday afternoon at ISSCC in San Francisco, AMD disclosed technical details of its accelerated processing unit (APU) known as Llano for the first time. AMD provided me with a telephone briefing of their Llano disclosure at ISSCC (Paper 5.6 An x86-64 Core Implemented in 32nm SOI CMOS). My briefing was delivered by AMD Senior Fellow Sam Naffziger who leads processor design.
Naffziger co-authored the AMD paper that highlights the X86 side of the design although there is speculation that much of the challenge of the APU lies in the monolithic fusion of CPU and GPU on silicon.
The AMD PR team told me that the decision to present the X86 design first is not a reflection of the relative difficulty of one over the other. However, they did admit that ntegrating the APU on the same silicon as the CPU required some interesting design elements that they will keep close to the vest for now.
Obviously, I can't comment on that, but I have to agree with the marketing folks that so much has been said about GPU integration in the last few years that there was a clear need to keep the X86 core design from getting swamped in the news.
AMD believes it takes more than just scaling up the number of cores to improve performance. In fact, they think that approach will soon reach a limit. In AMD's assessment, we are in the mature stage of multi-core design, but poised to begin the new era of "heterogeneous systems" offering "abundant data parallelism" and "power efficient GPUs."
To summarize the Llano APU, it will contain four X86 cores each with over 35 million transistors occupying just shy of 10mm2. All four cores get their own megabyte of L2 cache SRAM (which adds to the quoted total transistor count and silicon associated with each). AMD targets operation above 3GHz and supply voltages of 0.8 to 1.3 V.
Although AMD's ISSCC presentation today is not about the new Bulldozer or Bobcat architecture planned for next year, the Llano design offers some cool features.
AMD uses a legacy architecture for the device to lessen the risk of the transition to the 32nm process node. But ISSCC is about circuits, and that's what AMD will present. Naffziger and the rest of the design team have wrapped a number of power reduction elements around their existing X86 core design.
Drawing on a solid history of energy efficient CPUs, AMD's processor group presents three power management innovations in its talk this afternoon.
1. Core power gating
Gives Llano an envelope of 2.5 to 25W depending on the performance demand. Each core can be independently and completely disconnected from the power supply.
2. Digital on-die temperature measurements
On die temperature measurements are not new, but AMD's digital approach claims to improve accuracy and repeatability of the die temperature map.
3. Power aware clock grid design
Taking a close look at the clock grid saved a lot of potentially wasted watts. This thorough approach to not just achieving clock specs across the die but also trying to improve the clock efficiency actually reduced the metal capacitance in the grid by more than 80% and reduced the number of final clock buffers by better than half.
The before and after pictures of the clock metal grid in the AMD presentation are quite striking. And they should be. There are big power-performance benefits of the new clock design if you consider that up to 30% of total processor power consumption can be consumed by the clock tree.