SAN MATEO, Calif. In the long war of trade-offs between power consumption and performance, performance has won the bulk of the battles. But with CPU designs on a path to surpass a billion transistors in less than 10 years, microprocessor vendors have issued a call to arms to cap power in future transistors, circuits and microarchitectures.
Early salvos in the stepped-up campaign will be fired at this week's International Solid-State Circuits Conference (ISSCC) and the Windows Embedded Developers' Conference, when Alchemy, Hitachi, Intel, NEC and others describe products and techniques that scrimp on power.
Among those in the vanguard is Intel Corp., whose latest Pentium 4 is specified at 55 watts at 1.5 GHz. Patrick Gelsinger, vice president and chief technology officer for Intel, will deliver a keynote address at ISSCC that will underscore the need for multiple off-ramps from what he calls "the curve that's leading to thousands of watts for a CPU."
For Intel, one event that raised the red flag on the power problem was the die revisions that had been necessary to rein in the Pentium 4's size and power appetite. In the days of the 486 and the first Pentiums, power was not a substantial design driver, said Intel fellow Glenn Hinton. "In the Pentium 4 it was an important factor but not the most important." But in the future, Hinton said, "we'll have to change the way we design the microarchitecture for low power."
The alternative is frightening. If the status quo holds, the power required to drive a CPU will be 600 W by 2010. Certain measures can be taken to cool the processor, such as heat-pipe advances or self-contained liquid cooling techniques, but ultimately the problem has to be solved through better design, said Gelsinger.
Ideas being bandied about by Intel include using multithreading instead of power-inefficient, speculative execution; putting two CPUs on the same die; and relying more on power-efficient application-specific hardware, Gelsinger said.
Intel is also looking to bulk up on embedded memory, which is probably the path of least resistance. Adding successively larger L2 caches is attractive because memory chews up 10 times less power than logic circuits while boosting performance, Gelsinger said.
When adding cache, one of the goals will be to make it dense. Intel officials in the past have hinted at a preference for embedded DRAM, but it's too early to see whether the choice will ultimately be DRAM or some form of SRAM. "The trend I'm suggesting is more memory on die. I think that's pretty certain and needs to be pursued," Gelsinger said.
As on-chip memory expands, it accounts for a larger slice of the chip's thermal budget; hence the need to keep its power under control. Already, cache memory consumes about 30 percent of a CPU's power, said Kenichi Osada of Hitachi's System LSI Research Department.
To beat back power consumption, Hitachi has devised a low-power 32-kbyte four-way-set-associative embedded cache, which will be explained at ISSCC. Fabricated on a 0.18-micron CMOS process, the cache consumes 1.7-milliwatts on a 0.65-V power supply while running at 120 MHz. It can be scaled up to 530 mW at 2 V and 1.04 GHz, Osada said.
At lower voltages, designers have found that a smaller threshold voltage is best for high-speed operation. But to reduce leakage current and maintain sufficient static-noise margins, the threshold voltage of the memory cells must be slightly higher than that of the peripheral circuits. To prevent that voltage differential from causing activation failures on the sense amplifiers, Hitachi developed a voltage-adapted timing-generation scheme using 12 dummy cells. A timing pulse is used for activating the sense amplifier and for resetting word lines.
While bigger caches are on the horizon, chip architects are also looking at replacing general-purpose logic circuits with more efficient special-function logic blocks, such as single-issue, multiple-data (SIMD) instruction units. These circuits are inherently better suited for "human interface" applications, such as voice processing and communications, Gelsinger said.
Special-purpose logic
There will soon be a strong need for special-purpose logic that executes communications algorithms, he said. "We anticipate platforms in the future as natively comms-enabled, such as multi-wireless-algorithm capability not only in mobile devices but in laptops and desktops."
Designers of high-end microprocessors may have to look to embedded processors for some guidance on power consumption. In embedded, Mips per watt is a widely used yardstick, and designers continue to look at novel ways to simplify their designs to keep a lid on wattage.
One company working to drive down power consumption is startup Alchemy Semiconductor Inc., which will show its first 32-bit MIPS-based design this week at Microsoft's Windows Embedded Developers' Conference. Using voltage scaling and a low-voltage 0.18-micron process technology from Taiwan Semiconductor Manufacturing Co., the Au1000 processor is said to dissipate less than 200 mW running at 200 MHz and 900 mW at 500 MHz.
"We're talking about power for the whole chip, not just core plus cache," said Phil Pompa, co-founder and vice president of sales and marketing for Alchemy.
Variable voltage
During active operation, the Au1000 can quickly deactivate segments of the device while, for example, waiting for a joystick interrupt. It can also shift down the voltage and phase-locked loop clock so that the device is largely idle, thereby dropping power dissipation from 500 mW to 120 mW. The voltage shifts can vary depending on the firmware, Pompa said.
The technique mirrors Intel's mobile voltage positioning method. Used in Intel's mobile processors, the technique changes the voltage depending on processor activity.
Another option is to add more gradations of chip deactivation, as NEC Corp. did with its MIPS-based 64-bit VR4131 processor, also slated to debut at the Windows embedded conference. The device includes a suspend mode that shuts down everything except the crystal clock, including the PLL.
It's an unusual move. "In many devices people keep the PLL running because once you stop it and start it up again, it takes a certain time to stabilize and drive the other clocks," said Arnold Estep, senior marketing manager for NEC Electronics (Santa Clara, Calif.).
The NEC design takes 40 microseconds to emerge from suspend mode, but while it's asleep the processor consumes only 1 mW. If both the PLL and crystal are still active, consumption ratchets to 5 mW. The device also uses individual clocks and other power management features so that it consumes 220 mW while running at 200 MHz, which is equivalent to 1,545 Mips/W.
At the microarchitecture level, Alchemy engineers tried to keep their design's five-stage pipeline simple and without out-of-order execution. The processor design takes advantage of the MIPS32 branch delay slot and early condition code generation to turn branches without any latency so that no power is wasted doing branch prediction, said Gregory Hoeppner, vice president of engineering for Alchemy.
Insufficient gain
Other observers agreed that speculative execution, used extensively over the past decade to wring every possible drop of performance from an architecture, will have to be scaled back because the resultant performance gain is no longer sufficient to justify the hit on power. In looking at two instruction branches in speculative instruction execution, Gelsinger said, "you're consuming power without yielding benefit to the user."
Linley Gwennap, chief analyst of the Linley Group, said CPU designers no longer have the luxury of leaning on speculative execution. "You're doing everything you can just because there's some slim chance that you can do something useful," he said. "From a practical standpoint, it's going to chew up more power."
Multithreading is one option that should be examined, said Intel's Gelsinger, because it keeps the hardware busy even during a cache miss. It adds 10 percent more logic to the CPU, boosting power by almost as much, but raises throughput 30 percent. Intel isn't using multithreading in its PC processors but has designed it into network processors.
Sun Microsystems, meanwhile, uses multithreading in its dual 32-bit MAJC architecture, which will be described at ISSCC.
Gwennap said Intel most likely will have to adopt multithreading in its PC processors. Such emerging areas as natural-language recognition will likely exploit mutlithreading, he said.
"It's not about running the spreadsheet faster," Gwennap said.
Jerry Ascierto contributed to this story.