Design Article
Tell us What You Think
We want to know what you thought about this Design. Let us know by adding a comment.
Building predictability into your low-power design flow
Pete Hardee and Buda Leung, Cadence Design Systems
4/16/2012 10:19 AM EDT
4. PSO implementation
In addition to MSV, at this stage we’re going to apply PSO with state retention to both MACs. While we’ve coded each of the MACs as a separate power domain, creating four possible power modes, we are going to only compare power when both MACs are running, or both MACs are idle, and consequently shutoff. We specify the domains and modes in a Common Power Format (CPF) file, where we additionally coded the state retention, level shifter, and isolation requirements. The CPF code to implement this low-power architecture is shown below:
Next, we simulated the design again, modeling PSO, isolation, and state retention from the CPF file. We used the simulator’s automatic low-power assertion generation, and specified the creation of a power-aware verification plan, which we’ll load later to check our coverage. We used power-aware dynamic simulation to confirm the correct behavior of the power domains in the specified power modes – that each domain transitioned through the power states correctly, that isolation correctly prevented the propagation of corruption (signal values are corrupted to the ‘X’ value in the off-state) from off-domains to on-domains, and that state retention was correctly implemented.
The automatically-generated assertions check the correct order of events for transitions between the modes. The simulator automatically records metrics at the power domain level, and records coverage on all related control signals for all CPF rules. These are basic checks to see if the control signals have toggled.
The Cadence Incisive® Enterprise Simulator supports CPF natively, rather than interpreting the power modes via a separate module communicating across the PLI as some other power-aware simulators do. As a result, execution performance in power-aware mode is very comparable with regular functional simulation. The combination of high performance, power-aware assertions, power-aware coverage metrics, and special debug capabilities for low power reduce the chances of power-related bug escapes.
Having verified the CPF with PSO in simulation, we used RTL synthesis once again to look at the new power estimates. These are shown in Table 4. We can see that, as expected, PSO gives us significant savings in idle mode and little change in fully active mode. Adding MSV and PSO together, as expected, gives us our best result. And remember, all this exploration of power architecture has been done at RTL, without going through exhaustive synthesis runs.

5. Full synthesis and MVt optimization
At this stage, we committed our final power architecture to a synthesized netlist. Our synthesis tool implemented the isolators and state retention registers, and applied advanced clock gating and MVt techniques – accurately implementing the same power intent we simulated, as well as applying state-of-the-art power optimizations. However, in making these power optimizations, synthesis tools have to be fully power domain-aware. This means that cross-domain logic optimizations that would make sense in the power-on state can introduce problems in other power states. The synthesis algorithms have to respect power domain boundaries, power modes, and details such as isolation values to avoid problems. The sort of structural problems that can be introduced by incorrect power cell insertion, and subsequent non-power–aware optimizations in synthesis and physical design tools, are best detected using power-aware formal verification tools that can carry out structural checks and full power-aware equivalence checking.
Table 5 shows our final synthesized results. Here the synthesis tool has implemented isolation, state retention, and level shifters in the correct locations based on its power domain-aware engine. We also decided to utilize MVt libraries. A multi-objective synthesis engine was able to optimize for power, timing, and area, and where possible, recover timing slack by moving non-critical paths to HVt cells. Providing the synthesis tool with a rich selection of MVt libraries allows the mapping algorithms to do an even better job at structuring and optimization, yielding the most power-efficient design. And we can see the numbers are tracking our RTL estimates very closely.

6. Correlation with signoff power analysis
Finally, we read the synthesized netlist into the Encounter Digital Implementation System. After initial placement, we are able to use the signoff-quality power analysis engine – Encounter Power System (EPS) – to confirm the power estimates. The final results are shown in Table 6.

Authors: Pete Hardee and Buda Leung
Pete Hardee is a Director of Solutions Marketing, responsible for the Low-Power Solution at Cadence. He has 17 years of experience in the EDA and silicon IP industries. He has a BSc (Eng) degree in Electrical Engineering from Imperial College, London, and an MBA from Warwick Business School.
Buda Leung is a Solutions Engineer with the Silicon Realization Group at Cadence. He is currently responsible for front-end methodology development and deployment, including work in SystemC™ and RTL synthesis, formal verification, and implementation. He holds a BSEE from the University of California, Riverside.
Acknowledgements
This article would not have been possible without the extensive contributions from Paul Weil, who ran many of the experiments in this case study. Additional assistance came from John Decker and Mickey Rodriguez.
This posting is part of the EDA Designline power series and is archived and updated. The root is accessible here. Please send me any updates, additions, references, white papers or other materials that should be associated with this posting. Thank you for making this a success - Brian Bailey.
In addition to MSV, at this stage we’re going to apply PSO with state retention to both MACs. While we’ve coded each of the MACs as a separate power domain, creating four possible power modes, we are going to only compare power when both MACs are running, or both MACs are idle, and consequently shutoff. We specify the domains and modes in a Common Power Format (CPF) file, where we additionally coded the state retention, level shifter, and isolation requirements. The CPF code to implement this low-power architecture is shown below:
set_design dma_macTo start power-aware verification, we first ran the RTL, cell library, and CPF file through a quality check using a static verification tool, Conformal® Low Power. This finds syntax errors and design object mismatches (such as power intent rules that are applied to design objects that don’t exist in the RTL) easier and quicker than we could find with power-aware simulation, and it also checks that cells to support the specified rules actually exist in the target cell library.
###################################
### Create 3 power domains:
### default PD: PDcore 1.08 Volts
### PDmac1: 0.9 volts can be shut-off
### PDmac2: 0.9 volts can be shut-off
########################################
### PDcore
create_power_domain -name PDcore -default
### PDmac1
create_power_domain -name PDmac1 -instances ethernet_mac_1 \
-active_state_conditions { low@!pse[0] }\
-shutoff_condition {pcm_inst/pse[0]} -base_domains {PDcore}
###PDmac2
create_power_domain -name PDmac2 -instances ethernet_mac_2 \
-active_state_conditions { low@!pse[1] }\
-shutoff_condition {pcm_inst/pse[1]} -base_domains {PDcore}
###Define power modes
create_power_mode -name PMmacon -domain_conditions {PDcore@high PDmac1@low PDmac2@low} -default
create_power_mode -name PMmacoff -domain_conditions {PDcore@high PDmac1@off PDmac2@off}
create_power_mode -name PMmac1 -domain_conditions {PDcore@high PDmac1@off PDmac2@low}
create_power_mode -name PMmac2 -domain_conditions {PDcore@high PDmac1@low PDmac2@off}
###State retention rules
create_state_retention_rule -name SRPG_rule1 -domain PDmac1 \
-restore_edge {!pcm_inst/psr[0]} -save_edge {pcm_inst/psr[0]}
create_state_retention_rule -name SRPG_rule2 -domain PDmac2 \
-restore_edge {!pcm_inst/psr[1]} -save_edge {pcm_inst/psr[1]}
###Level shifter rules
create_level_shifter_rule -name shifter_rule1 -from PDmac1 -to PDcore
update_level_shifter_rules -names shifter_rule1 -location to -prefix ls_mac1_
create_level_shifter_rule -name shifter_rule2 -from PDmac2 -to PDcore
update_level_shifter_rules -names shifter_rule2 -location to -prefix ls_mac2_
create_level_shifter_rule -name shifter_rule3 -from PDcore -to PDmac1
update_level_shifter_rules -names shifter_rule3 -location to
create_level_shifter_rule -name shifter_rule4 -from PDcore -to PDmac2
update_level_shifter_rules -names shifter_rule4 -location to
# All low iso for MAC1
create_isolation_rule -name iso_rule1 -from PDmac1 -isolation_condition {!pcm_inst/pice[0]} -isolation_output low
update_isolation_rules -names iso_rule1 -location to -prefix iso_mac1_
# All low iso for MAC2
create_isolation_rule -name iso_rule2 -from PDmac2 -isolation_condition {!pcm_inst/pice[1]} -isolation_output low
update_isolation_rules -names iso_rule2 -location to -prefix iso_mac2_
end_design
Next, we simulated the design again, modeling PSO, isolation, and state retention from the CPF file. We used the simulator’s automatic low-power assertion generation, and specified the creation of a power-aware verification plan, which we’ll load later to check our coverage. We used power-aware dynamic simulation to confirm the correct behavior of the power domains in the specified power modes – that each domain transitioned through the power states correctly, that isolation correctly prevented the propagation of corruption (signal values are corrupted to the ‘X’ value in the off-state) from off-domains to on-domains, and that state retention was correctly implemented.
The automatically-generated assertions check the correct order of events for transitions between the modes. The simulator automatically records metrics at the power domain level, and records coverage on all related control signals for all CPF rules. These are basic checks to see if the control signals have toggled.
The Cadence Incisive® Enterprise Simulator supports CPF natively, rather than interpreting the power modes via a separate module communicating across the PLI as some other power-aware simulators do. As a result, execution performance in power-aware mode is very comparable with regular functional simulation. The combination of high performance, power-aware assertions, power-aware coverage metrics, and special debug capabilities for low power reduce the chances of power-related bug escapes.
Having verified the CPF with PSO in simulation, we used RTL synthesis once again to look at the new power estimates. These are shown in Table 4. We can see that, as expected, PSO gives us significant savings in idle mode and little change in fully active mode. Adding MSV and PSO together, as expected, gives us our best result. And remember, all this exploration of power architecture has been done at RTL, without going through exhaustive synthesis runs.

Table 4: Estimate with MSV and PSO
Keep
in mind that MSV helps more when the circuit has high activity and can
rarely be shutoff, while PSO provides the greatest benefit for functions
within the design that have relatively long periods of inactivity. 5. Full synthesis and MVt optimization
At this stage, we committed our final power architecture to a synthesized netlist. Our synthesis tool implemented the isolators and state retention registers, and applied advanced clock gating and MVt techniques – accurately implementing the same power intent we simulated, as well as applying state-of-the-art power optimizations. However, in making these power optimizations, synthesis tools have to be fully power domain-aware. This means that cross-domain logic optimizations that would make sense in the power-on state can introduce problems in other power states. The synthesis algorithms have to respect power domain boundaries, power modes, and details such as isolation values to avoid problems. The sort of structural problems that can be introduced by incorrect power cell insertion, and subsequent non-power–aware optimizations in synthesis and physical design tools, are best detected using power-aware formal verification tools that can carry out structural checks and full power-aware equivalence checking.
Table 5 shows our final synthesized results. Here the synthesis tool has implemented isolation, state retention, and level shifters in the correct locations based on its power domain-aware engine. We also decided to utilize MVt libraries. A multi-objective synthesis engine was able to optimize for power, timing, and area, and where possible, recover timing slack by moving non-critical paths to HVt cells. Providing the synthesis tool with a rich selection of MVt libraries allows the mapping algorithms to do an even better job at structuring and optimization, yielding the most power-efficient design. And we can see the numbers are tracking our RTL estimates very closely.

Table 5: Post-Synthesis Estimate with MSV, PSO, and MVt
Compare
the post-synthesis numbers, now using MVt libraries, with our RTL
estimates. We can see that the synthesis tool was very close in terms of
RTL power estimates. This shows that we were able to analyze multiple
power architectures without full synthesis runs, yet have the confidence
to know that these decisions have predictable results. 6. Correlation with signoff power analysis
Finally, we read the synthesized netlist into the Encounter Digital Implementation System. After initial placement, we are able to use the signoff-quality power analysis engine – Encounter Power System (EPS) – to confirm the power estimates. The final results are shown in Table 6.

Table 6: Correlation with Signoff Power Analysis
We
did not go through full physical design including routing and clock
tree synthesis. Bear in mind that at the full-chip level, the clock tree
can be a big contributor to full-chip power – it can easily account for
30-40% of the total dynamic power. So, to get a full-chip power
estimate, you should account for the clock tree power, which we have not
done in this case, but that can be achieved with this solution. Routing
and buffering in physical design also makes a difference, so the close
correlation we see here is actually better than you would expect to see
in practice on a fully placed and routed design. However, the result is
useful to show the correlation of the logic functions throughout the
flow, and that the process we have followed fully represents the true
effects of the design decisions we were making, resulting in a
predictable, convergent outcome.Authors: Pete Hardee and Buda Leung
Pete Hardee is a Director of Solutions Marketing, responsible for the Low-Power Solution at Cadence. He has 17 years of experience in the EDA and silicon IP industries. He has a BSc (Eng) degree in Electrical Engineering from Imperial College, London, and an MBA from Warwick Business School.
Buda Leung is a Solutions Engineer with the Silicon Realization Group at Cadence. He is currently responsible for front-end methodology development and deployment, including work in SystemC™ and RTL synthesis, formal verification, and implementation. He holds a BSEE from the University of California, Riverside.
Acknowledgements
This article would not have been possible without the extensive contributions from Paul Weil, who ran many of the experiments in this case study. Additional assistance came from John Decker and Mickey Rodriguez.
This posting is part of the EDA Designline power series and is archived and updated. The root is accessible here. Please send me any updates, additions, references, white papers or other materials that should be associated with this posting. Thank you for making this a success - Brian Bailey.
Navigate to related information

