The first test we ran was cpu-halt
We ran this first because it was among the easiest ways to make
significant improvements in clock gating. Figure 2 shows a snapshot of
the clock-gating improvement process as tracked by PowerPro. Thirteen
blocks are shown that had been leveraged between Jaguar and a previous
design, Bobcat. By helping track progress often, even as functionality
and timing work was progressing, the team was able to drive down active
clock counts dramatically during product development.
Figure 2. Clock-gating improvements based on cpu-halt regressions.
test was also run after adding a new block (the shared L2 cache
controller) to the design that was not leveraged from the previous
processor core. The significant drop in activity seen from Month3 to
Month4 shows a point at which the functionality of the new block was
nearly complete and design work began focusing on power concerns (Figure
Figure 3. Average clocked flops after adding “newblock”: the shared L2 cache controller.
then ran various applications on PowerPro (Table 1). The goal was to
minimize the average number of flops clocked each cycle by optimizing
away flops or improving clock-gating efficiency. Designers could look at
RTL as-is flop-efficiency details as well as recommended improvements
for gating efficiency. The design owner’s name was associated with each
block to establish a clear assignment of responsibility for reviewing
and improving clock-gating results.
Table 1. Summary of PowerPro AppTyp results. (Note: “newblock” is not part of the CPU core total.)
the same frequency, AMD Bobcat and AMD Jaguar have similar maximum
power levels for the virus case. (Due to timing work, Jaguar can run at
lower voltages for the same frequency, but it also has higher IPC
architecturally.) Table 2 shows the result of our clock-gating efforts
on the AMD Jaguar core. For typical applications, even though the
instructions per clock (IPC) was improved from one core to the next, the
percentage of active flops decreased by approximately 25%.
Table 2. Comparison of clock-gating improvements. (Note: % of Flops Active is approximate.)
addition to running a snapshot of RTL code through the PowerPro flow,
the AMD gates team would do synthesis, placement, routing, and gate
simulations from the same tag. These PTPX runs included accurate gate
and wire capacitance for the actual tape-out netlist. However, getting
an accurate PTPX result can take several weeks, because it requires that
the design be synthesized and routed through a back-end flow that is
capable of achieving the high frequencies at which the AMD Jaguar cores
can run. The general PTPX results demonstrate that using PowerPro as a
quick estimate for power work was useful. Also, based on Bobcat silicon
results, reasonable correlation (+/-10%) between silicon and PTPX
results has been observed.
In summary, AMD’s efficient RTL clock-gating analysis flow had these key advantages:
- RTL analysis could run over the weekend and analyze key power benchmark tests.
- Output format was easy to parse and summarize for designer use.
- Recommended improvements had value as suggestions and showed possible optimizations.
- Correlation between active clock count and total power used was good.
even given IPC and frequency improvements, PowerPro helped achieve an
approximately 20% reduction in typical dynamic application power
compared to an already-tuned low-power X86 CPU.
About the author
Kommrusch received his BS from University of Illinois in 1987 and his
Masters degree from Massachusetts Institute of Technology in 1989. Steve
has worked as a lead engineer on low power processors for over 15
years. At Hewlett Packard, Steve worked on a 3 ARM core ASIC for the
CapShare 910 handheld scanner. With National Semiconductor, Steve worked
on the Geode LX, an SoC with 2D graphics, X86 processor, and integrated
display control which was in the OLPC laptop (One Laptop per Child).
Most recently, Steve architected the clock, reset, and power control
signals for the AMD Jaguar processor. All of these products made
extensive use of clock gating to improve battery life.
If you found this article to be of interest, visit EDA Designline
where you will find the latest and greatest design, technology,
product, and news articles with regard to all aspects of Electronic
Design Automation (EDA).
Also, you can obtain a highlights update delivered directly to your
inbox by signing up for the EDA Designline weekly newsletter – just Click Here
to request this newsletter using the Manage Newsletters tab (if you
aren't already a member you'll be asked to register, but it's free and
painless so don't let that stop you).