Excessive power consumption has become the chief roadblock to further scaling of semiconductors, threatening to stall advancement in all electronics sectors—everything from further miniaturizing mobile devices to revving supercomputers.
While the causes are rooted in the immutable laws of physics and chemistry, engineers have devised a novel set of innovations that are mitigating the problem today and that promise to reinvigorate the chip industry tomorrow.
Here are the top five ways to reduce power on future ICs. They are already in development, and collectively they hold the promise of solving the problem for good within the decade.
Electronic design automation tools can optimize for low power by enabling teams to co-design for it from the very beginning. In fact, the developers of lowest-power processors and systems-on-chip in the industry achieved their advantage not only by optimizing architectures and materials, but also by co-designing packaging, power sources, RF circuitry and software to minimize power without diminishing performance or inflating cost.
"Building low power requires a holistic approach across technology, design methodology, chip architecture and software," said David Greenhill, director of design technology and EDA at Texas Instruments (Dallas).
TI has set the bar for low-power devices by optimizing each subsystem using pioneering techniques, such as building its own process technologies to balance off-mode leakage with active-current performance, or using voltage and frequency scaling to define a variety of power-saving operating modes.
"The first step is knowing the goal of the product from a performance and power perspective. Once those goals are determined, the process can be designed to provide the required performance without exceeding the device's power budget," said Randy Hollingsworth, 28-nanometer platform manager at TI.
EDA tools have been key to consistently achieving these lower-power goals, but sometimes they require a few iterations around the design loop, since estimates of power consumption with conventional EDA tools are only accurate near the end of the design cycle. For future ICs, power consumption estimates need to be accurate as early as possible in the design cycle.
Providers of a few specialized tools have picked up that baton. Atrenta Inc. (San Jose, Calif.), for instance, makes a tool called Spyglass Power that performs power consumption estimation, reduction and verification using the standard register-transfer level (RTL) descriptions that are available from every major EDA tool very early in the design cycle.
"Today, engineers want to estimate power very early in the design process," said Peter Suaris, Atrenta's senior director of engineering. "You can no longer wait until the end of the design cycle to estimate power consumption; you need to co-design for power at the RTL level, and make changes in your design to conserve power right from the beginning."
Atrenta reckons that its specialized power conservation tools can estimate the final power budget within 20 percent, while its power reduction tools can shave up to 50 percent off the energy consumed by the final design.
Atrenta's tool can estimate power consumption very early, here pinpointing potential hot spots before the beginning of the design cycle. Source: Atrenta Click on image to enlarge.
One big area that is missing is in addressing the large power budget with clock tree buffers for clock distribution. Companies like Cyclos Semi are working on LC resonant tank implementations which can reduce clock distribution power by 80%, and overall power by 15-20%, in GHz clock CPUs and SoCs
Power consumption the semiconductor largely depends on the area where the semiconductor chip is used, at what voltage level it is used, at what frequency it is used and the list will continue to grow, so it is very hard to consolidate it in a single article, a book will be a better way to explain it, but still the article is written very smartly that it has covered all the different angles.
One way to handle leakage is temperature control. One may not need to go all the way to liquid nitrogen temperatures to get benefits. Of course, this is probably more than ten years out, and cooling adds overall power as well. Sigh.
Planar, fully-depleted silicon-on-insulator (FD-SOI). As I just blogged (see http://bit.ly/xse0uI), the SOI Consortium's most recent results get a 40% power reduction on 28nm complex circuits including ARM cores and DDR3 memory controllers. It lets you run all digital device designs, including SRAMs, at very low Vdd (e.g., 0.6 volt).And see Steve Leibson's blog (http://bit.ly/wG22yL) in which IBM shows you get a 10x reduction in leakage power with back biasing on planar FD-SOI. Also, FinFETs (the vertical flavor of fully-depleted) on SOI are even lower power than FinFETs on bulk. Lots of info on www.soiconsortium.org.
Approximate computation is another technique for power saving. This can take the form of limited precision computation or approximate arithmetic. Approximate arithmetic can be hardwired or due to reducing voltage and inaccurate results can be simply tolerated or corrected (where the energy cost of correction is less than the cost of always correct arithmetic if the approximate answer is sufficiently accurate). Carry prediction could be an example of such.
Approximation can be used for predictive functions (e.g., branch prediction and motion estimation [in video compression]) and for approximate results (e.g., output to humans).
Approximation can also apply to storage. Not only might the accuracy of least significant bits be sacrificed but also predictive and caching structures could lose accuracy. Obviously in the predictive case, the loss of accuracy must not hurt performance so much that the power savings in predictor storage are more than lost by the extra power from misspeculation. Analog storage and computation have been proposed for some uses (like perceptron branch predictors)--mainly for performance reasons, but such techniques may also have energy efficiency benefits.
Improved prediction, early misspeculation detection, and pre-determination (applied to branches, cache way selection, prefetching, and other areas) can increase energy efficiency by reducing unnecessary work.
Along the lines of interconnects, greater integration and appropriate placement of components can reduce the cost of communication. Processor-in-memory (e.g., Intelligent RAM and recently Venray Technologies proposals) and processor-near-memory (e.g., on DIMM or in a logic chip of something like Micron's Hybrid Memory Cube) are usually proposed for improved performance but can also improve energy efficiency.
Even reducing on-chip communication can have an impact on energy efficiency. Placing communicating components close together can not only reduce the energy per communication but also reduce the latency of communication (which may reduce the duration of computation--facilitating a longer period of deeper sleep) and the unpredictability of communication (which may allow tighter scheduling of activity when chip-internal network congestion issues are not a concern--knowledge is power, or at least facilitates power-saving optimizations).
Clever and limited use of clocking can also reduce power. I think one of the grid layout many-core vendors uses a simple left-to-right (rather than tree) clocking because clock skew only matters locally. Asynchronous design at various granularities has been considered for power saving. Although variation may limit its application, there may still be some place for wave pipelining even in synchronous designs.
To reduce the chip power, I have few suggestions as follows:
- Apply top level H-tree high drive clock buffer structure to minimize the buffer usage
- Use the intermediate metal layers for clock tree routing
- Redesign special DFF to reduce the clock toggle power, it may require additional P&R support
- Apply gated clock latch approach which normally replaces about 40%~60% clock buffer
- Use the correct DFM guideline to reduce design margin for redundant logic elimination
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.