News & Analysis
Comment
peterh123
One big area that is missing is in addressing the large power budget with clock ...
przemek
so how do you do 48b addresses with 32b ISA? Surely we can can agree that ...
How best to reduce power on future ICs
R Colin Johnson
2/21/2012 11:46 AM EST
Excessive power consumption has become the chief roadblock to further scaling of semiconductors, threatening to stall advancement in all electronics sectors—everything from further miniaturizing mobile devices to revving supercomputers.
While the causes are rooted in the immutable laws of physics and chemistry, engineers have devised a novel set of innovations that are mitigating the problem today and that promise to reinvigorate the chip industry tomorrow.
Here are the top five ways to reduce power on future ICs. They are already in development, and collectively they hold the promise of solving the problem for good within the decade.
Embrace co-design
Electronic design automation tools can optimize for low power by enabling teams to co-design for it from the very beginning. In fact, the developers of lowest-power processors and systems-on-chip in the industry achieved their advantage not only by optimizing architectures and materials, but also by
co-designing packaging, power sources, RF circuitry and software to minimize power without diminishing performance or inflating cost.
"Building low power requires a holistic approach across technology, design methodology, chip architecture and software," said David Greenhill, director of design technology and EDA at Texas Instruments (Dallas).
TI has set the bar for low-power devices by optimizing each subsystem using pioneering techniques, such as building its own process technologies to balance off-mode leakage with active-current performance, or using voltage and frequency scaling to define a variety of power-saving operating modes.
"The first step is knowing the goal of the product from a performance and power perspective. Once those goals are determined, the process can be designed to provide the required performance without exceeding the device's power budget," said Randy Hollingsworth, 28-nanometer platform manager at TI.
EDA tools have been key to consistently achieving these lower-power goals, but sometimes they require a few iterations around the design loop, since estimates of power consumption with conventional EDA tools are only accurate near the end of the design cycle. For future ICs, power consumption estimates need to be accurate as early as possible in the design cycle.
Providers of a few specialized tools have picked up that baton. Atrenta Inc. (San Jose, Calif.), for instance, makes a tool called Spyglass Power that performs power consumption estimation, reduction and verification using the standard register-transfer level (RTL) descriptions that are available from every major EDA tool very early in the design cycle.
"Today, engineers want to estimate power very early in the design process," said Peter Suaris, Atrenta's senior director of engineering. "You can no longer wait until the end of the design cycle to estimate power consumption; you need to co-design for power at the RTL level, and make changes in your design to conserve power right from the beginning."
Atrenta reckons that its specialized power conservation tools can estimate the final power budget within 20 percent, while its power reduction tools can shave up to 50 percent off the energy consumed by the final design.
While the causes are rooted in the immutable laws of physics and chemistry, engineers have devised a novel set of innovations that are mitigating the problem today and that promise to reinvigorate the chip industry tomorrow.
Here are the top five ways to reduce power on future ICs. They are already in development, and collectively they hold the promise of solving the problem for good within the decade.
Embrace co-design
Electronic design automation tools can optimize for low power by enabling teams to co-design for it from the very beginning. In fact, the developers of lowest-power processors and systems-on-chip in the industry achieved their advantage not only by optimizing architectures and materials, but also by
co-designing packaging, power sources, RF circuitry and software to minimize power without diminishing performance or inflating cost.
"Building low power requires a holistic approach across technology, design methodology, chip architecture and software," said David Greenhill, director of design technology and EDA at Texas Instruments (Dallas).
TI has set the bar for low-power devices by optimizing each subsystem using pioneering techniques, such as building its own process technologies to balance off-mode leakage with active-current performance, or using voltage and frequency scaling to define a variety of power-saving operating modes.
"The first step is knowing the goal of the product from a performance and power perspective. Once those goals are determined, the process can be designed to provide the required performance without exceeding the device's power budget," said Randy Hollingsworth, 28-nanometer platform manager at TI.
EDA tools have been key to consistently achieving these lower-power goals, but sometimes they require a few iterations around the design loop, since estimates of power consumption with conventional EDA tools are only accurate near the end of the design cycle. For future ICs, power consumption estimates need to be accurate as early as possible in the design cycle.
Providers of a few specialized tools have picked up that baton. Atrenta Inc. (San Jose, Calif.), for instance, makes a tool called Spyglass Power that performs power consumption estimation, reduction and verification using the standard register-transfer level (RTL) descriptions that are available from every major EDA tool very early in the design cycle.
"Today, engineers want to estimate power very early in the design process," said Peter Suaris, Atrenta's senior director of engineering. "You can no longer wait until the end of the design cycle to estimate power consumption; you need to co-design for power at the RTL level, and make changes in your design to conserve power right from the beginning."
Atrenta reckons that its specialized power conservation tools can estimate the final power budget within 20 percent, while its power reduction tools can shave up to 50 percent off the energy consumed by the final design.
![]() Atrenta's tool can estimate power consumption very early, here pinpointing potential hot spots before the beginning of the design cycle. Source: Atrenta Click on image to enlarge. |
Navigate to related information



junko.yoshida
2/21/2012 2:14 PM EST
What have we missed? Tell us your preferred method of reducing power on future ICs; or give us your wish list for tools that would enable it.
Sign in to Reply
linpaws
2/22/2012 3:48 AM EST
Revision of the communication standards would go a long way in reducing interconnect power. They date long back and are revised only for speed, not power. The interfaces need a complete re-haul for overall power reduction.
Sign in to Reply
R_Colin_Johnson
2/22/2012 2:48 PM EST
The biggest omission is integrated voltate regulators on the bottom of 3D chip stacks, which IBM and SRC demonstrated today at ISSCC. Read about it here: http://bit.ly/xzdYS1
Sign in to Reply
peterh123
2/26/2012 10:26 AM EST
One big area that is missing is in addressing the large power budget with clock tree buffers for clock distribution. Companies like Cyclos Semi are working on LC resonant tank implementations which can reduce clock distribution power by 80%, and overall power by 15-20%, in GHz clock CPUs and SoCs
Sign in to Reply
DickH
2/21/2012 4:52 PM EST
junk 64bit. We need processors that have:
1) efficient addressing/fetch of 1-byte, 2-byte, 4-byte and 8-byte locations.
memory that loads and stores all these sizes equally efficiently. I now have a 64b operating system, and with it came the need to double my memory size to get the same performance (that's considered an improvement?).
2) 32b data
3) no more than 48b address bus.
Sign in to Reply
Paul A. Clayton
2/22/2012 12:57 PM EST
Your first point is not practical. On-chip caches require ECC. If ECC coverage is over 8 bits, there is 62.5% overhead; if ECC coverage is over 64 bits, there is 12.5% overhead. (One could use per-byte parity and use a write-through L1 cache with ECC in L2, but this also has power trade-offs.) You might also note that current PC processors support 128-bit wide loads and stores efficiently to provide decent performance for their SIMD extensions.
Most 64-bit systems define 'int' as 32 bits, so only pointers double in size (and even x86-64 does not double code size; MIPS and Power add no code size increase for full use of 64-bit features).
Even the use of 64-bit pointers is not required for a 64-bit processor, though the software interfaces supporting the use of 32-bit pointers are not broadly used.
For PCs with largish memory (between 3 and perhaps 16 GiB), using a 64-bit OS with support for 32-bit applications may be a sweet spot--allowing the OS to map all of memory in a flat space while allowing applications that do not need large amounts of memory to use smaller pointers.
For your second point, the data size (in memory) is software controlled and 'int' tends to be defined as 32 bits (for PCs).
Also, while there would be some power savings from penalizing the performance of 64-bit operations, for PC processors this would be a small relative savings (e.g., support for aggressive out-of-order execution uses much more power).
Your third point also seems debatable. Few processors support full 64-bit physical address spaces.
While right-sizing processors is attractive, there are issues of design complexity and validation when a broad range of applications is targeted--and targeting a broad range of applications increases production volume which reduces the impact of fixed costs and moves the product up the learning curve faster.
Sign in to Reply
przemek
2/24/2012 3:54 PM EST
so how do you do 48b addresses with 32b ISA? Surely we can can agree that banking and PAE-like schemes are ugly.
Sign in to Reply
sharps_eng
2/21/2012 6:08 PM EST
in theory a tiny 8-bit micro could handle the keystrokes I am typing. In theory a 16-bit processor could handle the wysiwyg display/menu interface efficiently, and in practice we use at least 32-bits to handle the live spell-check dictionary lookup and 128 and more for graphics engines' text-to-pixel formatting. I haven't mentioned the mass of the Internet that will get involved when I submit this entry.
So because we use heavy tackle to do the fancy graphics and mass lookups, we don't bother much with smaller devices to handle small tasks. There are exceptions, and some stuff happens in bytewide hardware, whatever we imagine is going on on top.
Variable-width buses on ARM allow some tweaking of power, but it is heavily application-specific and there is a cost to switching modes.
Systems that only use power when actually doing something are commonly used now, but they still have to withstand the dissipation when full speed is demanded.
I think the answers will also evolve at user-level as smart ideas and friendlier devices. Maybe my laptop's webcam could decide if i am not looking at it and dim the screen; it works on a dumb timer at the moment which means it is always switching off when I want it!
For lower-power we could sense an RF tag printed on spectacle frames.
Sign in to Reply
Frank Eory
2/21/2012 8:32 PM EST
Excellent article. I think about the only area you didn't cover is the software side of it -- power-efficient software design.
Sign in to Reply
MeirG
2/22/2012 3:18 AM EST
Regarding interconnect: is anybody aware of the state of on-chip networks?
If interconnect is so wasteful, way not "time share" the interconnect to save real estate, power at a acceptable penalty in performance. As feature size to chip size ratio is getting higher and higher, we encounter the problem of synchronizing far sub-systems. A network is a good step to solve this.
Sign in to Reply
3D Guy
2/22/2012 7:33 AM EST
Today's micron scale TSVs tackle the chip-to-chip interconnect problem, and reduce their power. To reduce the on-chip interconnect power, one needs monolithic 3D technology. Check out www.monolithic3d.com
Sign in to Reply
Paul A. Clayton
2/22/2012 1:22 PM EST
Dark silicon has a few variants for special purpose hardware: loosely-coupled auxiliary processors, tightly-coupled coprocessors or functional units, and "conservation cores"--which seem to fit in between the other two, being relatively tightly coupled to the general purpose processor but using less of the GP processor's functionality to perform its function.
There is also specialization of performance using performance-asymmetric chip multiprocessors (like ARM's big.LITTLE).
Configurable memory hierarchies have also been explored academically. E.g., reducing size or associativity to reduce power use when such does not
have a significant impact on performance. Cache placement and replacement policies can also impact energy efficiency. While most of the academic research in this seems to have been on improving performance, concepts like Non-Uniform Cache Architecture can also apply to power-saving goals.
There may be some benefit to tuning the performance of particular components to avoid stalls. E.g., if application performance is limited by memory bandwidth, it may be more energy efficient to run the processor at a constant lower frequency than to use light sleep modes between bursts of memory activity.
Inexpensive fast persistent memories may also provide significant power savings by allowing power to be removed from the memory without losing state while avoiding data transfers to and from a separate persistent storage.
Sign in to Reply
Oscar Law
2/22/2012 2:14 PM EST
To reduce the chip power, I have few suggestions as follows:
- Apply top level H-tree high drive clock buffer structure to minimize the buffer usage
- Use the intermediate metal layers for clock tree routing
- Redesign special DFF to reduce the clock toggle power, it may require additional P&R support
- Apply gated clock latch approach which normally replaces about 40%~60% clock buffer
- Use the correct DFM guideline to reduce design margin for redundant logic elimination
Sign in to Reply
Paul A. Clayton
2/23/2012 11:24 AM EST
Along the lines of interconnects, greater integration and appropriate placement of components can reduce the cost of communication. Processor-in-memory (e.g., Intelligent RAM and recently Venray Technologies proposals) and processor-near-memory (e.g., on DIMM or in a logic chip of something like Micron's Hybrid Memory Cube) are usually proposed for improved performance but can also improve energy efficiency.
Even reducing on-chip communication can have an impact on energy efficiency. Placing communicating components close together can not only reduce the energy per communication but also reduce the latency of communication (which may reduce the duration of computation--facilitating a longer period of deeper sleep) and the unpredictability of communication (which may allow tighter scheduling of activity when chip-internal network congestion issues are not a concern--knowledge is power, or at least facilitates power-saving optimizations).
Clever and limited use of clocking can also reduce power. I think one of the grid layout many-core vendors uses a simple left-to-right (rather than tree) clocking because clock skew only matters locally. Asynchronous design at various granularities has been considered for power saving. Although variation may limit its application, there may still be some place for wave pipelining even in synchronous designs.
Sign in to Reply
Paul A. Clayton
2/23/2012 11:24 AM EST
Approximate computation is another technique for power saving. This can take the form of limited precision computation or approximate arithmetic. Approximate arithmetic can be hardwired or due to reducing voltage and inaccurate results can be simply tolerated or corrected (where the energy cost of correction is less than the cost of always correct arithmetic if the approximate answer is sufficiently accurate). Carry prediction could be an example of such.
Approximation can be used for predictive functions (e.g., branch prediction and motion estimation [in video compression]) and for approximate results (e.g., output to humans).
Approximation can also apply to storage. Not only might the accuracy of least significant bits be sacrificed but also predictive and caching structures could lose accuracy. Obviously in the predictive case, the loss of accuracy must not hurt performance so much that the power savings in predictor storage are more than lost by the extra power from misspeculation. Analog storage and computation have been proposed for some uses (like perceptron branch predictors)--mainly for performance reasons, but such techniques may also have energy efficiency benefits.
Improved prediction, early misspeculation detection, and pre-determination (applied to branches, cache way selection, prefetching, and other areas) can increase energy efficiency by reducing unnecessary work.
Sign in to Reply
Adele.Hars
2/23/2012 12:07 PM EST
Planar, fully-depleted silicon-on-insulator (FD-SOI). As I just blogged (see http://bit.ly/xse0uI), the SOI Consortium's most recent results get a 40% power reduction on 28nm complex circuits including ARM cores and DDR3 memory controllers. It lets you run all digital device designs, including SRAMs, at very low Vdd (e.g., 0.6 volt).And see Steve Leibson's blog (http://bit.ly/wG22yL) in which IBM shows you get a 10x reduction in leakage power with back biasing on planar FD-SOI. Also, FinFETs (the vertical flavor of fully-depleted) on SOI are even lower power than FinFETs on bulk. Lots of info on www.soiconsortium.org.
Sign in to Reply
elctrnx_lyf
2/23/2012 12:24 PM EST
Surely we would need the stacking technologies to package multiple chips in a single package. Just like what xilinx did we will see more and more chips coming out in the future.
Sign in to Reply
ndancer
2/23/2012 1:25 PM EST
One way to handle leakage is temperature control. One may not need to go all the way to liquid nitrogen temperatures to get benefits. Of course, this is probably more than ten years out, and cooling adds overall power as well. Sigh.
Sign in to Reply
kinnar
2/23/2012 2:56 PM EST
Power consumption the semiconductor largely depends on the area where the semiconductor chip is used, at what voltage level it is used, at what frequency it is used and the list will continue to grow, so it is very hard to consolidate it in a single article, a book will be a better way to explain it, but still the article is written very smartly that it has covered all the different angles.
Sign in to Reply