datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Comment


SteveKo

3/1/2013 12:46 PM EST

Indeed, it's interesting that even our max power virus pattern only needs 15% of ...

More...



SteveKo

3/1/2013 12:42 PM EST

:-) About 10 years ago we did some serious looking into a clockless X86 design ...

More...

Reducing power in AMD processor core with RTL clock gating analysis

Steve Kommrusch - AMD, Inc.

2/4/2013 10:45 AM EST

Starting the simulation
Once the models were built, simulations were started. PowerPro uses switching activity interchange format (SAIF) files to track gate efficiency and gather temporal information about the design. AMD devised a way for the simulations to enable SAIF tracking, starting and ending at specific instruction counts. This was much preferred over starting and ending at a given simulation cycle number or at a given time because it allowed the key instruction sequence to be analyzed from week to week even as designers made improvements that caused the test to execute more instructions per clock cycle.

After simulations completed, 39 test cases were run on PowerPro. Three of the important cases were the halt test, virus test, and 17 tests in the AppTyp group. Synthetic and “special” groups for clock-gating analysis of certain modes of interest were also deployed.

After PowerPro completed these tests, a final in-house script was executed that parsed through all the report files and created tables. The script processed the PowerPro reports and presented the results to track progress and identify further opportunities for power improvements.

The goal was to reduce the number of flops that are clocked. Reducing the number of flops or improving clock gating manifests as an improvement in the PowerPro analysis. The PowerPro analyses, and the RTL recommendations suggested therein, could be used as an alternative to PTPX roll-ups. Further, the PowerPro results revealed how efficient the design could be. For example, when PowerPro suggested a way to eliminate 20% of the gated activity, engineers could review the pre-optimization report and see which flops were less efficient. Or they could look at the post-optimization report after the PowerPro recommended improvements were made; if PowerPro determined that the design could use 10% fewer clocked flops, the engineer could examine the design to see which 10% were eliminated.

The result was that the actual power, as shown in the monthly or bi-monthly PTPX reports, was lower because designers were making design adjustments based on the week-to-week reports from PowerPro.






yjchen

2/8/2013 2:21 AM EST

Hi Steve,

As you mentioned, the correlation between silicon and PTPX are about +/-10%. From your experience, what's the correlation between PowerPro and PTPX? And Powerpro between silicon?

Besides, in your flow, the input of Powerpro is saif. Why don't you use real waveform, like vcd or fsdb? Thanks.

yjchen

Sign in to Reply



GMN

2/22/2013 3:54 PM EST

PowerPro does use VCD and FSDB for more accurate analysis. However, if you are primarily concerned about clock gating efficiency, and not looking for peak power analysis, then SAIF is faster and more efficient

Sign in to Reply



SteveKo

3/1/2013 12:38 PM EST

GMN had a good reply for SAIF usage, we used the Calypto recommended flow for that technical decision.

For PowerPro to PTPX, we were not using their newer version which estimates actual power, we were looking and clock gating efficiency. However, there was useful correlation there. As per table 2, we achieved about a 25% reduction in flop activity rate from one design to the next, and that correlated with about 25% lower dynamic power for typical applications. (As a short point of interest, we did check how much power tended to be used per active flop on one of our early runs. But block-to-block varied a fair bit. As one would expect, blocks with lots of combination logic like floating point had more gate fanout capacitance per flop than other blocks).

Sign in to Reply



daleste

2/11/2013 10:29 PM EST

Good work to improve the efficiency of the design. What ever happened to the clock-less logic that was supposed to make all of this not needed?

Sign in to Reply



SteveKo

3/1/2013 12:42 PM EST

:-) About 10 years ago we did some serious looking into a clockless X86 design for deep low power, but the toolsets for efficient timing closure weren't there. And providing sufficiently robust async timing for state machines eats into perceived benefits. I think clock trees and meshes with optimized gating strategies will be with us for a while.

Sign in to Reply



Frank Eory

2/13/2013 9:18 AM EST

I find it amazing that after all these years of using clock gating to reduce power, the tools & methodologies continue to improve to such a degree that these types of large power reductions are still possible.

Sign in to Reply



SteveKo

3/1/2013 12:46 PM EST

Indeed, it's interesting that even our max power virus pattern only needs 15% of the flops clocked. There was a lot of designer work optimizing clock gating, but Calypto's SLEC methodology helped show what could be done too.
Tools evolve and designers gain experience, leading to ever lower active flop counts.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)