Evaluating a low power hardware realization flow
customer designs, we ran two types of test to evaluate a new ESL flow
centered on the integration of an HLS and an RTL power analysis tool.
The tool set we used was Calypto® Catapult® LP, which embeds the Calypto
PowerPro® technology “under the hood” of the Catapult HLS product.
Catapult LP enables designers to explore low power architectures at the
ESL while leveraging automated RTL power optimization techniques.
Case study #1 Clock gating
test focused on leveraging a sequential analysis engine for clock
gating insertion. We benchmarked a low power HLS flow against a normal
“baseline” HLS flow using real customer designs. Some of the designs
were written in pure C++ and some were in SystemC. These designs were
already tied to certain performance requirements for a given area.
compared the power consumption of the RTLs produced with the low power
HLS flow (LP) against the RTLs produced with a normal HLS flow (Base)
for a given architecture. Since most of the signal processing
applications had a minimum data path width of 8, we used a clock gating
width of 8 for all of these designs. Power estimation of RTL synthesized
using the normal HLS flow was estimated using the PowerPro power
estimation engine in standalone mode.
Data was collected for
different customer designs by running LP and Base HLS flows. We used a
variety of designs from different applications; for example, FFT, Video
Encoder I, and JPEG require high performance; whereas Automotive
requires extremely low power. Designs used were as small as 11.4k to as
large as 58.1k gates. Table 1 summarizes the data.
Table 1: Power optimization data
Table 1, we observe that a higher clock gating percentage does not
always indicate power savings. One reason for this is that if a gated
flop has to switch every clock cycle, it will not save power. CG (%)
indicates the percentage of total flops in the design that are gated.
CG Efficiency (%) indicates the cycles a gated clock is inactive based
on a representative vector set (SAIF, FSDB). Clock gating efficiency is
the measure of cycles for which a node is inactive. A 30% clock gating
efficiency of a flop means the flop is inactive for 3 out of 10 cycles
for a representative vector set [Figure-5].
Figure 5: Clock gating efficiency
ranks design registers by clock gating efficiency and accepts those
transformations that result in significant improvement in clock gating
efficiency. The sequential enable signals along with the corresponding
enable logic are automatically inserted into the resulting RTL. The
positive effect of this can be seen using the Video Encoder I design as
an example. The encoder already had 98.7% clock gating, but just by
strengthening the enable signal, Catapult LP was able to substantially
increase clock gating efficiency to further optimize the design’s power
by almost 50%.
The data in Table 1 also shows that better clock
gating efficiency always results in better power savings. The Catapult
LP flow was able to improve clock gating efficiency in all cases
Figure 6: Clock gating efficiency percentage improvement
1 shows that the power consumption of the designs used in this case
study varied from 37 µW to 12.3 mW. The percentage improvement between
the Base flow and the LP flow is shown in Figure-7.
Figure 7: Average power savings
graph shows a general trend in power savings improvement when Catapult
LP optimizations were turned on. For an extremely low power application
like Automotive, design improvement was approximately 12%; whereas, for a
high performance design like Image Scaler, there was a 50% improvement.
Absolute numbers for power can be inferred from Table 1.