SAN FRANCISCO A perennial topic at the International Solid State Circuits Conference is the debate between what have been termed the architects and the speed demons. These two camps, one emphasizing simple logic circuitry to increase clock frequency and the other pushing complex circuitry to increase the amount of work done in a clock cycle, have been fighting since the days of the first RISC CPUs over what was the best approach to increasing processor performance.
This year's incarnation of the debate held here at an evening session on Tuesday (Feb. 17) took the title "Processors and Performance: When do GHz Hurt?"
With so few viable processor design teams remaining in the industry, there is always the risk that such a panel will devolve into a round of Intel-bashing, and that's pretty much what happened this time.
Panel chair Shannon Morton, staff engineer at Icera Semiconductor, observed that the increase in clock frequency that has led us to 3 GHz CPUs had not come just from process scaling. There has also been a consistent and aggressive reduction in the number of fan-out-of-four gate delays per logic stage. This has contributed a significant portion of the reduction in cycle times.
Morton then framed the discussion with two broad questions. First, what is the end customer's perception of performance? Second, should we continue to focus on clock frequency in our pursuit of the customer's dollars?
Marc Tremblay, VP and fellow at Sun Microsystems, pointed out that clock frequency is only a primary lever for setting performance if the application is entirely-or nearly-cache-resident. Failing that, memory latency and bandwidth become the limiting factors. Tremblay said that the applications that primarily concern Sun in the server world are never cache resident.
Instead, there is a growing move to make them behave as if they were by intelligent, software-controlled prefetching. Tremblay hinted that the most promising development he had seen in some time was the blending of software-based prefetching with multithreading technology.
Philip Emma, manager of systems technology and microarchitecture at IBM, agreed that clock frequency was losing its traction as a performance driver, and that memory was getting far more critical. He added that the cost, in terms of complexity and power consumption, of pursuing higher frequencies was getting too great any way.
Emma predicted that the next big performance boosts would come from the likes of 3D packaging and optical interconnect, not from faster clocks.
Emma proclaimed himself the "whipping boy" for high clock speeds. If we examine interconnect flight time, careful to avoid resonance in the wire lengths, we could obtain clock speeds as high as 60 GHz, he theorized. But would we need to terminate on chip wires, just to avoid those resonances? Such a device would be a consummate power eater, he concluded. The cost of computing machinery would be a function of the power it consumed, he said, not its clock rate.
Consumer demand for "big iron" would not likely track the rise in clock speeds, advised Alisa Sherer, a technology fellow with Advanced Micro Devices. This meant that marketing dollars would need to track or exceed engineering dollars to ensure that consumers would be enticed by fast clocking PCs, she said.
Multithreading could multiply the amount of work performed by the microprocessor with each clock cycle, said Sun's Tremblay. In principle, a 256-thread machine could achieve terahertz clock rates, with each thread running a GHz race through the machine.
"If threading takes over, the GHz required is much lower," agreed Doug Carmean, a principal architect with Intel Corp.
"But no one knows how to program a machine with more than two threads," protested an audience questioner from MIT. And the compiler technology would likely not keep up with the requirements of multi-threading, Sun's Tremblay conceded.
But none of the panelists doubted that there would be fabrication processes in place that would support multi-GHz processor designs. (An aggressive roadmap shown at the Intel Developers' Forum, IDF, across the street from ISSCC here, suggested putting technology shifts on a two-year cycle, culminating with a 25-nm manufacturing process in 2009-2010). Clock estimates given in response to an audience challenge ranged from a 7 GHz to a 10 GHz.
Power consumption, a function of gate fan-out loading within the individual processor's design, would be the limiting factor, reminded Alisa Sherer. Fan out loading on the order of 20 to 22 gates would consume much more power at high clock rates than designs loaded by 16 or 18 gates, she said. Complexity would inevitably favor higher fan out loading.
"We could do 10 GHz," Sherer postulated. "But a 5 GHz clock was a much better target." Even then, the processor is likely to consume a couple of 100 watts - and there will be few PC applications likely to support that, she concluded.
Scherer claimed that consumers could, like enterprise buyers, be educated about the difference between clock frequency and throughput, and that increasingly the PC industry would focus on I/O bandwidth, multiprocessing/multithreading and even non-performance features such as portability and security.
Intel's Carmean said that on 3D rendering tasks, higher clock frequency translated directly into faster completion times. He then generalized these results by arguing that many of the tasks real users cared about were still single-threaded, unparallelized and big, and that a CPU designer could not turn his back on them, even though it really did hurt.
Showing a chart of steadily decreasing fan-out-four gate delays per stage over time, Carmean said "at about 10 gate delays per stage, let's say it gets tingly. Below 10, no doubt about it. It hurts."
AMD's Scherer replied in-metaphor: "Pain is not necessary for performance." She said not only was higher clock frequency too costly to the architect, but that end users were no longer willing to pay the price in increased power consumption either. "No one wants something that sounds like a hydrofoil under their desk," she stated.
Hisashige Ando, CTO of the server systems group at Fujitsu, claimed that while GHz may matter to the marketing department, the real issue was the three "ps": performance, power and price. He then demonstrated the theoretical advantage of an array of small, slower processors over a single large, faster processor, assuming the application was sufficiently suitable for parallel execution.
Finally Mark Horowitz, Yahoo professor of electrical engineering and computer science at Stanford, observed that the last speaker on a panel finds all the good points already taken. He suggested the pursuit of performance was a global optimization problem, in which on seeks the area in which the incremental increases in performance divided by the incremental costs in other quantities were approximately equal.
This had led him to predict some time ago that the decrease in gate-delays per stage would soon stop. He then stated his objective for the evening was to convince the "crazies at Intel to give up before they proved me wrong."