Design Article

IMG1

The power of sequential design optimizations

Mitch Dale

8/1/2008 6:00 PM EDT

Energy (or, more specifically, energy consumption) is at the forefront in everyone's mind these days. Whether we are sensitive to the cost of fuel, to the electricity bill or to a dead cell phone battery, our awareness of the energy we use has increased.

Mobile devices are at the heart of consumer electronics. These products are feature-rich, computationally intensive and limited by battery storage. In the last five years, the performance of mobile devices has increased by an order of magnitude, but battery capacity has improved only in small increments. The solution to bridge this energy gap must come from energy-efficient electronics.

Energy consumption in electronics is a function of power dissipation of the device and the applications running on it. In previous generations of devices, performance was the primary concern for designers. With a shift toward energy efficiency, the challenge is how to minimize energy and complete the application in a reasonable amount of time.

The year 2006 marked the crossover point when semiconductor revenue from processors going into consumer devices surpassed the revenue from those integrated into PCs. This trend, along with continuing demand for energy-efficient electronics, has accelerated the need for low-power design. Power optimization can no longer be an afterthought: It is a mandatory design requirement for consumer markets.

Power optimization must be addressed at all levels of design, from system level to GDSII. Many power optimization tools operate during register transfer level (RTL) synthesis and at later stages. These automated tools make changes to the design, such as adding clock gating, substituting high-voltage threshold cells and synthesizing power-efficient clock trees. Working in the existing design flow, they require little change in methodology. Also, the optimizations are combinational and can be verified by combinational equivalence checkers. Although these tools are necessary for 90-nm designs and below, they are not sufficient to create the most energy-efficient designs possible.

By working at higher levels of abstraction and making sequential changes, additional power savings are possible.

For example, understanding the interaction between application and architecture may make it possible to save power by running computations in parallel at half the frequency. Micro-architectural changes have a greater potential for saving power than do optimizations done at the gate level and below. However, micro-architectural optimizations are often difficult to implement, and because they change the sequential behavior of designs, they cannot be verified by combinational equivalency checkers.

A successful low-power design strategy ensures a cumulative reduction in power while taking into account timing and area requirements. The relationship between power, timing and area is not always intuitive. Because power optimizations push more signals toward negative slack, they can cause timing violations. Consequently, some power optimizations will end up being re-implemented or backed out. This creates a catch-22. Higher levels of abstraction offer a greater potential for saving power but have less accurate design information. While design information at the gate level is very accurate, its potential for saving power is limited. The solution is optimizing power at the RTL.

RTL clock gating

Clock gating is the most common RTL power optimization. It reduces dynamic power consumption by adding combinational logic into the clock path of registers, stopping them from propagating values to downstream logic. By applying clock gating, the functionality of the design remains unchanged and switching activity is reduced. The amount of power saved depends on the enable logic added and how long the registers are gated (turned off). The latter is a function of the switching activity for an application.

There are two types of RTL clock gating: combinational and sequential. Combinational clock gating translates a conditional statement in the RTL code into a clock-gating cell in the clock path of a register. Clock gating is automated by power-aware synthesis tools.

Sequential clock gating leverages inefficiencies in the RTL code, such as unused computations and data-dependent functions. Examples include observability-based optimizations that take advantage of data written to registers unused in subsequent clock cycles and stability-based optimizations that recognize when data is not valid from previous cycles. Both cases require multicycle functional analysis to identify sequential conditions as optimization candidates.

 See related chart

Sequential analysis does not depend on switching activity to determine where and when clock gating enable logic should be added. Sequential clock gating can save power by up to 60 percent in designs by reducing clock, register and combinational power dissipation.

Unlike combinational clock gating, which affects registers for a single cycle, sequential clock gating is a multicycle optimization that alters the state of the registers in the design. Consequently, sequential equivalence checking (SEC) must be used for verification. Unlike combinational equivalence checking, SEC does not require a one-to-one mapping of state elements to prove that two designs are the same. Alternatively, simulation methods can be used, although they require testbench development to exercise possible enable conditions.

Adding clock gating

The key to effective RTL clock gating is having an accurate picture of design activity while understanding the cost/ benefit relationship of each optimization.

The percentage of registers clock-gated is a common measure of how well a block is clock-gated. Because this measure does not take into account switching activity, it is a poor assessment of how effective a register is when clock-gated. Take, for example, a case in which a register is only clock-gated for a few cycles during startup. The register would be counted as having a clock gate, but the actual switching activity (and thus power savings) would be negligible.

Clock-gating efficiency is a much better gauge. The clock-gating efficiency of a register is defined as the percentage of time (clocks) the register is gated (not clocked) for a given set switching activity. The clock-gating efficiency of a design is computed by averaging the clock-gating efficiency of all registers. Registers with low efficiency point to design areas where additional clock gating can save power. The best candidates for clock gating are datapath registers with low efficiency.

Sequential clock gating enable logic can contain signals from multiple hierarchies and cross cycle boundaries. As a result, manual coding of sequential clock gating requires experienced engineers with considerable design knowledge.

In design blocks targeted for RTL synthesis, there can be hundreds or even thousands of sequential clock gating optimization opportunities. Manually calculating the cost/benefit for all these opportunities is not practical. Because manually modifying and verifying the code would take months, designers tend to make the minimum number of changes and leave additional power savings on the table.

Automated power optimization

Power optimization tools can simultaneously evaluate numerous optimizations against multiple constraints. Automation has been applied successfully by power-aware RTL synthesis tools for a variety of combinational power optimization techniques. Fundamental to automation is the ability to fit into existing design flows. This requires reading designs in standard languages and outputting an equivalent and optimized design in the same format.

PowerPro CG by Calypto, for example, is an RTL power optimization solution that identifies and inserts sequential clock gating enable logic in the user's design. Enable logic is created for registers not previously clock-gated and existing enable logic is strengthened, lengthening the duration that clocks are disabled. By automating the sequential clock gating processes, the tool is able to find more optimizations in less time than error-prone manual methods.

Sequential analysis computes the temporal relationships between design states across multiple clock cycles to reduce power. Clock gating enable logic is identified for each optimization and ranked by a cost/benefit function. For optimizations that pass the criteria, enable logic is inserted into the original RTL code while maintaining the comments, style and synthesis pragmas. This ensures that the output fits into RTL synthesis design flows and allows designers to easily identify the modified RTL code.

The optimized RTL can be verified using SEC. This formally verifies that no functional changes were introduced by the additional enable logic. As part of the PowerPro CG run, SEC verification scripts are generated that automate the setup and turn verification into a single command.

Mitch Dale (mdale@calypto.com) is director of product marketing at Calypto Design Systems. He holds a BS in applied mathematics and computer science from the University of California, Berkeley.

print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Most Popular

Product Parts Search

Enter part number or keyword
PartsSearch


FeedbackForm