Design Article
Comment
patrickg42
Dr DSP
Power is one more area for a possible design iteration loop that will bust ...
Design for power methodology
William Ruby
4/20/2012 10:22 AM EDT
RTL power refinement: automated RTL power reduction
At this phase of a DFP methodology the focus is on reducing power consumption at the RTL operand, or ‘function’ level. Because of the fine-grained level of detail needed in this phase, the task is very amenable to automation. However, it is important to point out that RTL power reduction is not optimization. Power reduction techniques available at the RT level of abstraction do not have detailed knowledge of timing, and therefore cannot optimize the design to achieve a specific target. Instead, such techniques are best used in a guided fashion; automated but not automatic. In an automated approach, the power reduction tool would present an opportunity to reduce power, but the designer would make the ultimate decision, if and how an RTL change should be implemented.
Another important aspect of RTL power reduction is that it must be based on RTL power analysis (i.e., you must measure power before you attempt to reduce it). Analysis-driven RTL power reduction will show that there is often a cost or impact on the design, in terms of leakage power or area, as additional logic gates are used to shut down activity.
A comprehensive set of RTL power reduction techniques must cover all areas of the design: clock / registers, memories, and datapath / control logic. For registers, power reduction is achieved through finding enable conditions for registers not already enabled in the design, as well as improving efficiency (or strengthening) existing enable conditions. There are both combinational and sequential techniques for clock enable generation and strengthening, including XOR-gating (self-gating), forward enable propagation, and observability don’t care (ODC).
For memories, power reduction is achieved through detecting and eliminating redundant memory cycles. Often, when memories in the design are selected, the address and read / write control signals are not changing, and yet the memory is continuously clocked. Even though the memory outputs do not change, since the same address location is being read every clock cycle, there is significant wasted power being consumed by the internal circuitry of the memory.
In the control logic / datapath area, it is possible to detect active clouds of logic that are converging on mux inputs, but are not being selected by the mux select signal. It may be feasible to control these logic clouds using the mux select signal, so it is important to identify how much power is wasted in the logic due to this redundant computation.
It is absolutely essential that RTL power reduction techniques are applied following the accurate computation of power consumption and power savings. Because reducing power consumption often involves adding additional logic to turn off logic signals or shut down clocks, the power overhead of this additional logic – both dynamic and static – must be considered. It is not sufficient enough to make RTL power reduction decisions solely based on signal frequencies or activities and duty cycles. Using “blind automation” or a simplistic push-button approach to RTL power reduction may result in a large area and / or timing impact with no actual power advantage.
It is absolutely essential that any RTL power reduction technique be applied following an accurate computation of power consumption and power savings.
RTL power regressions
After RTL coding is completed by the designer it is put through a functional verification cycle. During functional verification processing, while implementing functional engineering change orders (ECOs) or fixing functional bugs, it is possible that power consumption bugs can be introduced. Because traditional functional verification methods do not catch power consumption bugs, and power bugs are not fixed through downstream synthesis of a physical implementation flow, rigorous tracking of power consumption while the design project evolves is required. Similar to functional regressions, this ‘power regression’ power tracking is shown in Figure 3.

Early power integrity and package analysis
With shrinking product life cycles driven by the consumer electronics market, the ability to have as much analysis of the SoC and package / system as possible early in the design flow is critical. It is no longer practical for design teams to wait until the final stages of physical design implementation in order to perform power integrity analysis and select the proper package. Moreover, power-gating switching events can cause a large gradient in the power supply voltage, leading to noise-induced design failures, as illustrated in Figure 4.

Synthesis and place & route
Synthesis and Place & Route (SP&R) tools transform the design from RTL to gates, perform physical layout, and optimize the design to primarily meet timing and area goals. There are also several power optimization techniques that are available in SP&R tools. Synthesis performs clock-gating at the register level, leveraging register enable conditions. Clock-gating during synthesis is typically done with RTL as input, and produces a fully mapped gate-level netlist. In addition to clock-gating, synthesis tools can perform optimizations at the gate-level for incremental, additional power savings using techniques such as technology mapping and pin swapping. During Place & Route, additional power optimization is done mainly through down-sizing cells or substituting low leakage cells in timing paths where positive slack margin is available. The latter is also known as mixed-Vt (threshold voltage) cell swapping and is an effective technique for leakage power reduction.
Power consumption and power integrity sign-off
The final step in DFP methodology is to ensure that power integrity targets such as dynamic voltage drop are met, guarantee the power intent of the SoC design is preserved, and that the target power consumption is achieved. A highly accurate power integrity and power consumption analysis tool is required for SoC power sign-off. Another key requirement for this step is flexibility of sign-off analysis for various applications. Power delivery network stress tests can effectively employ a vectorless algorithm, while power consumption analysis and specific power integrity analysis for selected conditions can be driven by gate-level or RTL simulations. Gate-level simulations are time-consuming, but the ability to use RTL simulation data for sign-off analysis gives accurate results with fast turnaround times. The sign-off tool of choice must also deliver excellent accuracy against actual silicon measurements.
Conclusions
Power is a formidable challenge for SoC designs, spanning both power consumption and power integrity considerations. In order to tackle this challenge, power must be taken into account very early in the design process, starting with architectural trade-offs and RTL design, then continuing all the way to sign-off. Designing for low-power and power integrity is not an automatic process – there is no “low-power” button. Instead, power must be considered, analyzed, and managed in every step of the design flow. By following a design-for-power methodology, engineering teams can ensure power is managed in a predictable and consistent fashion, enabling design success.
About the author
William Ruby is the
Senior Director of Product Engineering for RTL Power products at Apache
Design, Inc. a subsidiary of ANSYS. Mr. Ruby has over 20 years of
experience in the EDA and semiconductor industry with broad expertise in
low-power design. He has served in executive and senior engineering
positions at Sequence, Synopsys, Intel, and Siemens. Mr. Ruby holds a
B.A. in Physics from University of California at Berkeley, an M.S. in
Electrical Engineering from University of Southern California, and an
M.B.A from San Jose State University. He was also awarded a patent in
high performance memory design.
This posting is part of the EDA Designline power series and is archived and updated. The root is accessible here. Please send me any updates, additions, references, white papers or other materials that should be associated with this posting. Thank you for making this a success - Brian Bailey.
At this phase of a DFP methodology the focus is on reducing power consumption at the RTL operand, or ‘function’ level. Because of the fine-grained level of detail needed in this phase, the task is very amenable to automation. However, it is important to point out that RTL power reduction is not optimization. Power reduction techniques available at the RT level of abstraction do not have detailed knowledge of timing, and therefore cannot optimize the design to achieve a specific target. Instead, such techniques are best used in a guided fashion; automated but not automatic. In an automated approach, the power reduction tool would present an opportunity to reduce power, but the designer would make the ultimate decision, if and how an RTL change should be implemented.
Another important aspect of RTL power reduction is that it must be based on RTL power analysis (i.e., you must measure power before you attempt to reduce it). Analysis-driven RTL power reduction will show that there is often a cost or impact on the design, in terms of leakage power or area, as additional logic gates are used to shut down activity.
A comprehensive set of RTL power reduction techniques must cover all areas of the design: clock / registers, memories, and datapath / control logic. For registers, power reduction is achieved through finding enable conditions for registers not already enabled in the design, as well as improving efficiency (or strengthening) existing enable conditions. There are both combinational and sequential techniques for clock enable generation and strengthening, including XOR-gating (self-gating), forward enable propagation, and observability don’t care (ODC).
For memories, power reduction is achieved through detecting and eliminating redundant memory cycles. Often, when memories in the design are selected, the address and read / write control signals are not changing, and yet the memory is continuously clocked. Even though the memory outputs do not change, since the same address location is being read every clock cycle, there is significant wasted power being consumed by the internal circuitry of the memory.
In the control logic / datapath area, it is possible to detect active clouds of logic that are converging on mux inputs, but are not being selected by the mux select signal. It may be feasible to control these logic clouds using the mux select signal, so it is important to identify how much power is wasted in the logic due to this redundant computation.
It is absolutely essential that RTL power reduction techniques are applied following the accurate computation of power consumption and power savings. Because reducing power consumption often involves adding additional logic to turn off logic signals or shut down clocks, the power overhead of this additional logic – both dynamic and static – must be considered. It is not sufficient enough to make RTL power reduction decisions solely based on signal frequencies or activities and duty cycles. Using “blind automation” or a simplistic push-button approach to RTL power reduction may result in a large area and / or timing impact with no actual power advantage.
It is absolutely essential that any RTL power reduction technique be applied following an accurate computation of power consumption and power savings.
RTL power regressions
After RTL coding is completed by the designer it is put through a functional verification cycle. During functional verification processing, while implementing functional engineering change orders (ECOs) or fixing functional bugs, it is possible that power consumption bugs can be introduced. Because traditional functional verification methods do not catch power consumption bugs, and power bugs are not fixed through downstream synthesis of a physical implementation flow, rigorous tracking of power consumption while the design project evolves is required. Similar to functional regressions, this ‘power regression’ power tracking is shown in Figure 3.

Figure 3: Example of Power Regressions Showing a Power Bug
In
order to enable an efficient power regression flow, verification
engineers must have access to detailed power data at all times for every
run, without repeating the entire power analysis flow. This is possible
only if the RTL power analysis tool has a database infrastructure with a
user applications programming interface (API) that allows the design
team to obtain specific power data and generate custom reports.Early power integrity and package analysis
With shrinking product life cycles driven by the consumer electronics market, the ability to have as much analysis of the SoC and package / system as possible early in the design flow is critical. It is no longer practical for design teams to wait until the final stages of physical design implementation in order to perform power integrity analysis and select the proper package. Moreover, power-gating switching events can cause a large gradient in the power supply voltage, leading to noise-induced design failures, as illustrated in Figure 4.

Figure 4: Power Supply Gradient Caused by a Power Gating Event
Analyzing
power integrity early in the design process is made possible with
reliable, accurate, and consistent RTL power analysis technology. This
enables design teams to perform early analysis, when RTL simulations
that represent functional behavior of the SoC are available. The key
challenge with using RTL simulations is that they typically cover tens
of thousands, or millions of clock cycles, when in reality only a few
critical cycles are needed for power integrity analysis. So having a
fast critical cycle selection algorithm is essential. Together with RTL
power analysis data and estimated parasitic values, this cycle selection
is encapsulated into a compact model representation, which is then
analyzed by a sign-off power integrity tool. Early power integrity
analysis with RTL power models enables power delivery network planning
and estimation of package characteristics – eliminating guesswork and
error-prone spreadsheet analysis.Synthesis and place & route
Synthesis and Place & Route (SP&R) tools transform the design from RTL to gates, perform physical layout, and optimize the design to primarily meet timing and area goals. There are also several power optimization techniques that are available in SP&R tools. Synthesis performs clock-gating at the register level, leveraging register enable conditions. Clock-gating during synthesis is typically done with RTL as input, and produces a fully mapped gate-level netlist. In addition to clock-gating, synthesis tools can perform optimizations at the gate-level for incremental, additional power savings using techniques such as technology mapping and pin swapping. During Place & Route, additional power optimization is done mainly through down-sizing cells or substituting low leakage cells in timing paths where positive slack margin is available. The latter is also known as mixed-Vt (threshold voltage) cell swapping and is an effective technique for leakage power reduction.
Power consumption and power integrity sign-off
The final step in DFP methodology is to ensure that power integrity targets such as dynamic voltage drop are met, guarantee the power intent of the SoC design is preserved, and that the target power consumption is achieved. A highly accurate power integrity and power consumption analysis tool is required for SoC power sign-off. Another key requirement for this step is flexibility of sign-off analysis for various applications. Power delivery network stress tests can effectively employ a vectorless algorithm, while power consumption analysis and specific power integrity analysis for selected conditions can be driven by gate-level or RTL simulations. Gate-level simulations are time-consuming, but the ability to use RTL simulation data for sign-off analysis gives accurate results with fast turnaround times. The sign-off tool of choice must also deliver excellent accuracy against actual silicon measurements.
Conclusions
Power is a formidable challenge for SoC designs, spanning both power consumption and power integrity considerations. In order to tackle this challenge, power must be taken into account very early in the design process, starting with architectural trade-offs and RTL design, then continuing all the way to sign-off. Designing for low-power and power integrity is not an automatic process – there is no “low-power” button. Instead, power must be considered, analyzed, and managed in every step of the design flow. By following a design-for-power methodology, engineering teams can ensure power is managed in a predictable and consistent fashion, enabling design success.
About the author
William Ruby is the
Senior Director of Product Engineering for RTL Power products at Apache
Design, Inc. a subsidiary of ANSYS. Mr. Ruby has over 20 years of
experience in the EDA and semiconductor industry with broad expertise in
low-power design. He has served in executive and senior engineering
positions at Sequence, Synopsys, Intel, and Siemens. Mr. Ruby holds a
B.A. in Physics from University of California at Berkeley, an M.S. in
Electrical Engineering from University of Southern California, and an
M.B.A from San Jose State University. He was also awarded a patent in
high performance memory design.This posting is part of the EDA Designline power series and is archived and updated. The root is accessible here. Please send me any updates, additions, references, white papers or other materials that should be associated with this posting. Thank you for making this a success - Brian Bailey.
Navigate to related information


Dr DSP
4/20/2012 5:33 PM EDT
Power is one more area for a possible design iteration loop that will bust schedules. Starting at the top and working down sees like a good technique, but wouldn't it be better to automate as much as possible. Remember when you had to do timing optimization manually? Let's hope soon power can use a much more automated approach.
Sign in to Reply
patrickg42
4/27/2012 3:22 AM EDT
The "Architectural and hardware/software trade-offs" paragraph states:
"Traditionally, these trade-offs have been performed using spreadsheets and other ad-hoc approaches. While these methods do have a certain amount of utility, a more structured and deterministic solution is needed in this area."
The solution exists and is already used by major mobile wireless players: It's ACEPlorer from Docea Power.
Sign in to Reply