Design Article
Tell us What You Think
We want to know what you thought about this Design. Let us know by adding a comment.
Making ESL power optimization a reality
Shawn McCloud, Bryan Bowyer and Vikas Tyagi - Calypto
1/7/2013 8:00 AM EST
Evaluating a low power hardware realization flow
Using customer designs, we ran two types of test to evaluate a new ESL flow centered on the integration of an HLS and an RTL power analysis tool. The tool set we used was Calypto® Catapult® LP, which embeds the Calypto PowerPro® technology “under the hood” of the Catapult HLS product. Catapult LP enables designers to explore low power architectures at the ESL while leveraging automated RTL power optimization techniques.
Case study #1 Clock gating
This test focused on leveraging a sequential analysis engine for clock gating insertion. We benchmarked a low power HLS flow against a normal “baseline” HLS flow using real customer designs. Some of the designs were written in pure C++ and some were in SystemC. These designs were already tied to certain performance requirements for a given area.
We compared the power consumption of the RTLs produced with the low power HLS flow (LP) against the RTLs produced with a normal HLS flow (Base) for a given architecture. Since most of the signal processing applications had a minimum data path width of 8, we used a clock gating width of 8 for all of these designs. Power estimation of RTL synthesized using the normal HLS flow was estimated using the PowerPro power estimation engine in standalone mode.
Data was collected for different customer designs by running LP and Base HLS flows. We used a variety of designs from different applications; for example, FFT, Video Encoder I, and JPEG require high performance; whereas Automotive requires extremely low power. Designs used were as small as 11.4k to as large as 58.1k gates. Table 1 summarizes the data.

From Table 1, we observe that a higher clock gating percentage does not always indicate power savings. One reason for this is that if a gated flop has to switch every clock cycle, it will not save power. CG (%) indicates the percentage of total flops in the design that are gated. CG Efficiency (%) indicates the cycles a gated clock is inactive based on a representative vector set (SAIF, FSDB). Clock gating efficiency is the measure of cycles for which a node is inactive. A 30% clock gating efficiency of a flop means the flop is inactive for 3 out of 10 cycles for a representative vector set [Figure-5].

PowerPro ranks design registers by clock gating efficiency and accepts those transformations that result in significant improvement in clock gating efficiency. The sequential enable signals along with the corresponding enable logic are automatically inserted into the resulting RTL. The positive effect of this can be seen using the Video Encoder I design as an example. The encoder already had 98.7% clock gating, but just by strengthening the enable signal, Catapult LP was able to substantially increase clock gating efficiency to further optimize the design’s power by almost 50%.
The data in Table 1 also shows that better clock gating efficiency always results in better power savings. The Catapult LP flow was able to improve clock gating efficiency in all cases (Figure-6).

Table 1 shows that the power consumption of the designs used in this case study varied from 37 µW to 12.3 mW. The percentage improvement between the Base flow and the LP flow is shown in Figure-7.

The graph shows a general trend in power savings improvement when Catapult LP optimizations were turned on. For an extremely low power application like Automotive, design improvement was approximately 12%; whereas, for a high performance design like Image Scaler, there was a 50% improvement. Absolute numbers for power can be inferred from Table 1.
Next: Second case study
Using customer designs, we ran two types of test to evaluate a new ESL flow centered on the integration of an HLS and an RTL power analysis tool. The tool set we used was Calypto® Catapult® LP, which embeds the Calypto PowerPro® technology “under the hood” of the Catapult HLS product. Catapult LP enables designers to explore low power architectures at the ESL while leveraging automated RTL power optimization techniques.
Case study #1 Clock gating
This test focused on leveraging a sequential analysis engine for clock gating insertion. We benchmarked a low power HLS flow against a normal “baseline” HLS flow using real customer designs. Some of the designs were written in pure C++ and some were in SystemC. These designs were already tied to certain performance requirements for a given area.
We compared the power consumption of the RTLs produced with the low power HLS flow (LP) against the RTLs produced with a normal HLS flow (Base) for a given architecture. Since most of the signal processing applications had a minimum data path width of 8, we used a clock gating width of 8 for all of these designs. Power estimation of RTL synthesized using the normal HLS flow was estimated using the PowerPro power estimation engine in standalone mode.
Data was collected for different customer designs by running LP and Base HLS flows. We used a variety of designs from different applications; for example, FFT, Video Encoder I, and JPEG require high performance; whereas Automotive requires extremely low power. Designs used were as small as 11.4k to as large as 58.1k gates. Table 1 summarizes the data.

Table 1: Power optimization data
From Table 1, we observe that a higher clock gating percentage does not always indicate power savings. One reason for this is that if a gated flop has to switch every clock cycle, it will not save power. CG (%) indicates the percentage of total flops in the design that are gated. CG Efficiency (%) indicates the cycles a gated clock is inactive based on a representative vector set (SAIF, FSDB). Clock gating efficiency is the measure of cycles for which a node is inactive. A 30% clock gating efficiency of a flop means the flop is inactive for 3 out of 10 cycles for a representative vector set [Figure-5].

Figure 5: Clock gating efficiency
PowerPro ranks design registers by clock gating efficiency and accepts those transformations that result in significant improvement in clock gating efficiency. The sequential enable signals along with the corresponding enable logic are automatically inserted into the resulting RTL. The positive effect of this can be seen using the Video Encoder I design as an example. The encoder already had 98.7% clock gating, but just by strengthening the enable signal, Catapult LP was able to substantially increase clock gating efficiency to further optimize the design’s power by almost 50%.
The data in Table 1 also shows that better clock gating efficiency always results in better power savings. The Catapult LP flow was able to improve clock gating efficiency in all cases (Figure-6).


Figure 6: Clock gating efficiency percentage improvement
Table 1 shows that the power consumption of the designs used in this case study varied from 37 µW to 12.3 mW. The percentage improvement between the Base flow and the LP flow is shown in Figure-7.


Figure 7: Average power savings
The graph shows a general trend in power savings improvement when Catapult LP optimizations were turned on. For an extremely low power application like Automotive, design improvement was approximately 12%; whereas, for a high performance design like Image Scaler, there was a 50% improvement. Absolute numbers for power can be inferred from Table 1.
Next: Second case study
Navigate to related information

