Once the models were built, simulations were started. PowerPro uses
switching activity interchange format (SAIF) files to track gate
efficiency and gather temporal information about the design. AMD devised
a way for the simulations to enable SAIF tracking, starting and ending
at specific instruction counts. This was much preferred over starting
and ending at a given simulation cycle number or at a given time because
it allowed the key instruction sequence to be analyzed from week to
week even as designers made improvements that caused the test to execute
more instructions per clock cycle.
After simulations completed,
39 test cases were run on PowerPro. Three of the important cases were
the halt test, virus test, and 17 tests in the AppTyp group. Synthetic
and “special” groups for clock-gating analysis of certain modes of
interest were also deployed.
After PowerPro completed these
tests, a final in-house script was executed that parsed through all the
report files and created tables. The script processed the PowerPro
reports and presented the results to track progress and identify further
opportunities for power improvements.
The goal was to reduce the
number of flops that are clocked. Reducing the number of flops or
improving clock gating manifests as an improvement in the PowerPro
analysis. The PowerPro analyses, and the RTL recommendations suggested
therein, could be used as an alternative to PTPX roll-ups. Further, the
PowerPro results revealed how efficient the design could be. For
example, when PowerPro suggested a way to eliminate 20% of the gated
activity, engineers could review the pre-optimization report and see
which flops were less efficient. Or they could look at the
post-optimization report after the PowerPro recommended improvements
were made; if PowerPro determined that the design could use 10% fewer
clocked flops, the engineer could examine the design to see which 10%
The result was that the actual power, as shown
in the monthly or bi-monthly PTPX reports, was lower because designers
were making design adjustments based on the week-to-week reports from
Indeed, it's interesting that even our max power virus pattern only needs 15% of the flops clocked. There was a lot of designer work optimizing clock gating, but Calypto's SLEC methodology helped show what could be done too.
Tools evolve and designers gain experience, leading to ever lower active flop counts.
:-) About 10 years ago we did some serious looking into a clockless X86 design for deep low power, but the toolsets for efficient timing closure weren't there. And providing sufficiently robust async timing for state machines eats into perceived benefits. I think clock trees and meshes with optimized gating strategies will be with us for a while.
GMN had a good reply for SAIF usage, we used the Calypto recommended flow for that technical decision.
For PowerPro to PTPX, we were not using their newer version which estimates actual power, we were looking and clock gating efficiency. However, there was useful correlation there. As per table 2, we achieved about a 25% reduction in flop activity rate from one design to the next, and that correlated with about 25% lower dynamic power for typical applications. (As a short point of interest, we did check how much power tended to be used per active flop on one of our early runs. But block-to-block varied a fair bit. As one would expect, blocks with lots of combination logic like floating point had more gate fanout capacitance per flop than other blocks).
PowerPro does use VCD and FSDB for more accurate analysis. However, if you are primarily concerned about clock gating efficiency, and not looking for peak power analysis, then SAIF is faster and more efficient
I find it amazing that after all these years of using clock gating to reduce power, the tools & methodologies continue to improve to such a degree that these types of large power reductions are still possible.
As you mentioned, the correlation between silicon and PTPX are about +/-10%. From your experience, what's the correlation between PowerPro and PTPX? And Powerpro between silicon?
Besides, in your flow, the input of Powerpro is saif. Why don't you use real waveform, like vcd or fsdb? Thanks.
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.