Design Article
New EDA Tools Improve Low Power Design
Dave Allen, Atrenta
2/19/2007 5:41 AM EST
Fortunately, a new generation of EDA tools and techniques is beginning to change that picture, greatly enhancing designers' ability to estimate power dissipation and achieve power goals. This article describes these new tools and techniques, as well as some promising capabilities which could be delivered in future EDA offerings. One common feature of such solutions is that they enable designers to more effectively tune power characteristics early in the design flow, when the cost of such optimizations is lowest and the impact greatest. Designing for power up front not only saves days or weeks of subsequent design iterations, but allows a degree of optimization that is difficult or impossible to achieve through late-stage changes.
Power-Sensitive Design Today: Difficulties and Shortcomings
Calculation of power is based on activity of the circuit, the capacitance, and the voltage, using the formula P = CV2f. This means that to reduce power, you have several variables to manipulate.
There are four main techniques used throughout the design flow. First, lower the voltage at the expense of performance, by providing lower-voltage power supplies to parts of the design; this introduces voltage domains in the design. Second, completely shutting off power to parts of the design will reduce power consumption; this isolates a power domain in the design. Third, reducing toggle activity of logic whose computational results are not being "listened to" will reduce power; this technique introduces clock gating into the design. Lastly, performing selective tradeoff of power versus performance, using only high threshold voltage cells where circuit timing is critical, will reduce power; this technique introduces multi-voltage threshold (multi Vt) cell substitution.
Figure 1 illustrates a typical power-sensitive design flow where these techniques are applied, though design teams may have variations on this flow depending on their design methodology and EDA tools.
- During the architectural stage, the design team makes decisions about where voltage and power domains are needed. Creating such domains involves significant trade-offs between power savings, design complexity and performance. Each additional domain introduces complexity, with a worse-than-linear dependence. Assessing these trade-offs is essential to defining the optimal number of domains, a decision that in turn drives implementation later in the design cycle. Yet today, there is no good way to accurately gauge these trade-offs. Typically, designers resort to best-guess scenarios and back-of-the-envelope calculations. These decisions need to be made at the architecture level to effectively drive design implementation later in the design cycle.
- After the architecture is defined and RTL development starts, designers plan for block-level clock gating. By this stage, voltage and power domains are frozen, and clock gating represents the next-biggest potential source of power savings. But in deciding where to insert the clock gate enables, designers have been hampered by the lack of tools for measuring the impact of such choices on power consumption — i.e., tools for accurately estimating power consumption at the early RTL level. Again, designers have had to resort to back-of-the-envelope calculations, and as a result clock gating has been error prone and inexact. Typically, this leads to time-consuming late-stage iterations to adjust clock gating during physical implementation (step 5, below).
- Once the RTL has reached the point where the design can be simulated, some design teams perform their first power estimates. If the estimates show that power budgets will not be met, then further analysis and design modifications are required. Power estimation may be done using a simple spreadsheet, or commercial or home-grown tools. Power estimation tools typically allow evaluation of what-if scenarios, which can help guide designers with power tradeoffs. Examples of what-if questions include:
- If we add pipeline stages to this datapath to reduce activity, does the clock power increase outweigh the savings?
- If implementation is able to use 70% high-Vt cells in this block, how much static power will be saved?
- If this block is turned off with a power domain instead of via global clock gating, how much static and dynamic power will be saved?
The problem here is that the accuracy of RTL power estimation has traditionally been low, limiting the effectiveness of design modifications at this stage. Other design teams wait until after gate level implementation (step 5) before performing power estimation, but by then, RTL is frozen, which is months too late to make a significant RTL change.
- As the RTL design progresses, a number of power-related verification tasks are required. For example, some teams insert level shifters (to shift voltage levels between voltage domains) and isolation logic (placed between power domains to maintain a known logic value for signals exiting a powered-off domain) at this stage. These changes must be verified against the design intent. For designs with multiple power domains, formal verification techniques could be used to ensure the proper sequencing of power-up and power-down for the domains. Some design teams use power-aware simulation to make sure that the X values introduced when a block powers down do not result in incorrect functional behavior.
Difficulties arise at this stage because commonly used methods of inserting level shifters and isolation logic (i.e., manually or using scripts) tend to be error prone, complicating the verification task. - The implementation team generates a gate-level design representation and creates a physical design which captures the design intent. In previous chip geometries (130nm and above), power distribution via a single chip-wide network was usually sufficient. In today's designs, upwards of twenty different power networks may be needed, with appropriate level shifters and isolation logic between the domains.
The challenge at this stage is that synthesis, place and route tools have not matured sufficiently in their comprehension of complex power management schemes and can't be relied upon to automatically specify the best physical implementation. For example, physical design tools attempt to automate the implementation of clock gating. But such tools only operate on explicit enables--ones which the designer has already specified. But many explicit enables in the design may achieve only a small power savings. In some cases, depending on the activity statistics, adding a clock gate may even increase the power. Implementation tools often "swamp" clock tree synthesis (CTS) with a large number of ineffective enables. Typically, significant manual intervention by physical designers is therefore required to save significant power with clock gating schemes; most run an automatic gating mechanism blindly and accept whatever results it generates.
Some power management techniques can only be applied at the gate level and during the implementation phase. Most technology libraries include two or more Vt levels, to allow trading off performance vs leakage power. This swapping is typically done late in the implementation process. The technique can be generalized to allow any gate resizing; this includes aggressive resizing for power recovery. Advanced implementation techniques like MTCMOS (Multi-threshold CMOS) and SRPG (State Retention Power Gating) are also emerging for designs where standby power is a key concern. - Once implementation is complete, final verification of power management and distribution can be done at this level. Power verification includes not only level shifter and isolation logic, but also ensuring that all design instances are connected to the correct supply. Similar verification can be performed for MTCMOS and SRPG implementation. It is critical to compare against the original design intent captured during the architecture and design stages. Yet automated tools typically have no way to deduce the intent. Some may attempt to "reverse engineer" the intent by looking only at the implementation, but this will only validate consistency, not correctness toward original intent. As a result, successful verification requires significant manual effort. And the stakes are high: any design flaws missed at this stage (just prior to tape-out) can be exceedingly expensive.
Domains Creation
Voltage and Power Domain Creation
There have been no effective tools to help architects weigh power and voltage domain trade-offs. The situation has improved, however, with the emergence of new tools that perform accurate power estimation at the early RTL stage, along with new RTL prototyping tools. Used in combination, such capabilities will allow designers to gauge the impact of different domain scenarios and optimize the design accordingly. Using RTL prototyping, for example, architects can explore a number of different possible block placements to pack together blocks which can be operated at a lower voltage. Architects can use power estimation to analyze curves showing voltage against maximum operating frequency. As the operating voltage decreases, the maximum achievable frequency decreases; blocks with lower acceptable operating voltages can be grouped together into a lower voltage island as shown in Figure 2.
From the standpoint of automatic power domain creation, there are several academic papers on establishing the on-time / off-time ratio of functional blocks. In general blocks with a short off-time are poor choices for power gating, due to the penalty of the rush current associated with frequently re-powering the block. By combining multiple RTL analysis engines including simulation and formal verification, it will be possible to identify blocks with long off-times or low on-time / off-time ratios, and such blocks can be recommended as better candidates for power gating.
Effective Clock Gating
In the past, clock gating has been a trial-and-error process, with RTL designers taking a best-guess approach to specifying explicit enables. The clock gating is later refined and corrected during implementation, when power consumption can be accurately measured. Often multiple iterations are needed between the design and implementation teams, a process both costly and time consuming. With the recent advent of accurate early RTL power estimation tools, RTL designers can now plan clock gating much more reliably, leading to much better implementation and greatly reducing the need for later redesign cycles.
One very exciting area for future tool development is in detecting clock gating opportunities and introducing enables automatically. By using multiple RTL analysis engines, it is possible to identify specific new enables which could be added to a design. Although the full application of this technique is complex, a simple example shown in Figure 3 explains clearly. In this figure, a register without an enable is downstream from a register with an enable. By delaying the enable one clock cycle, a new enable can be created for the downstream register. Since R2 is not needed when R1 is not sending data, R2 can be gated to save power.

Conclusion
IC power budgets have tightened dramatically in recent years, and EDA solutions haven't always kept pace. The inevitable result has been more work for designers and more time spent tracking down and fixing bugs. But things are improving. A number of key techniques and capabilities can speed and improve the quality of low-power design, and these capabilities are making their way into automated tools. Some of these solutions are already available. Recent offerings, for instance, provide the ability to more accurately estimate power consumption at early RTL, greatly enhancing designers' ability to optimize clock gating and other power management schemes. Other tools can now speedily and reliably perform traditionally problematic tasks like insertion of level shifters, isolation logic and clock gate enables. In the future, advanced prototyping tools will help guide definition of power and voltage domains. Thanks to such new and emerging solutions, even the most demanding power budgets can now be achieved successfully and cost-effectively.
About the Author:
Dave Allen is the product director for power at Atrenta. He holds a masters in computer engineering and bachelor in computer science, both from Rensselaer Polytechnic Institute. The author can be reached at: davea@atrenta.com



