Design Article

IMG1

Behavioral Design Drives Low-Power Silicon

Brett Cline, Mike Meredith, Forte Design Systems

2/16/2009 11:07 AM EST

Hardware designers adopt high level synthesis (HLS) for productivity benefits, and need the quality of results (QoR) to match or exceed what they can achieve with hand-constructed register transfer level (RTL) code. Historically, the most interesting QoR metrics have been limited mostly to circuit performance and chip area. As power consumption has risen in prominence as a dominant design criteria, it has also become a QoR metric of interest to HLS users.

Users of the new generation of high-level synthesis tools find that HLS can be used effectively to improve power consumption along with the other measures of circuit quality. Some of these improvements come from optimizations made by the HLS tool itself. Additional power reduction is achieved as a result of the inherent improvement HLS brings to the design flow, giving the designer the flexibility to easily experiment and identify the solution that consumes the least power.

High-level designers use a broad range of techniques to improve the overall power profile of their designs, including: power reduction by optimizing system architecture; micro-architecture exploration in the power dimension; high-level coding styles to reduce power; RTL coding styles for power optimization; and power optimizations made by HLS.

Power Reduction by Optimizing System Architecture
The most significant way to influence overall system power is to decide what parts of the system to code in software and what to implement as hardware. Implementing system functionality in software offers flexibility, but processors incur a significant power overhead by fetching instructions from memory. Memory accesses consume a substantial amount of power and each instruction fetch represents an energy expenditure that does not contribute directly to the computational goal. Customized hardware implementations don't pay this overhead.

In addition, functions implemented in software take more cycles to execute than they do when implemented in hardware, often by an order of magnitude or more. Given the relatively high number of processor gates active in every cycle, this can add up to a substantial amount of consumed power.

In the high level synthesis flow, functionality is described at a level of abstraction close to that used in software. The result is an efficient, customized circuit built with dedicated functional units such as adders and registers. This dedicated datapath is controlled by a custom finite state machine (FSM) that operates without the need for instruction fetches required by processors. Every memory access reads or writes a data value needed by the computation and the function can be accomplished in the minimum number of cycles possible with the given technology.

Using high level synthesis, a design team can implement and verify more functionality in hardware in less time than required for RTL design. This makes it feasible to move system functionality from software into dedicated hardware, reducing the number of instruction fetches and lowering required CPU clock speeds for significant system-level power savings.

Some functional elements in an electronic system may need to change during the life of the product and others can be expected to stay fixed. For example, some parts of the system are rigidly controlled by industry standards while other parts implement proprietary, product-differentiating features. Obviously, system functionality that is expected to change after the product ships should be implemented in software, while those that will not change are candidates for hardware implementation.

Over the life cycle of a product line, functional requirements often stabilize, allowing additional functions to be implemented in later generations of the hardware to improve system performance and reduce power consumption. Protocol implementations are prime examples. Although HLS has been used primarily for computationally intensive tasks, modern SystemC-based HLS tools are well-suited to protocol implementation. Using these tools, migrating a function from software to hardware implementation is both straightforward and efficient.

In the example shown below, a first generation system has three functions implemented in software and two in hardware (Figure 1). In the product's second generation, two of the software functions have reached a level of stability such that they can be implemented in hardware (Figure 2). This reduces the power that those functions consume and frees up the processor to implement added features, perhaps. Alternatively, it allows a more power-efficient processor can be used.


1. Functions IP-1, IP-2, and IP-3 are implemented in software in the first generation product.


2. Functions IP-1, and IP-3 are hardened in the second generation product.

Micro-Architecture Exploration in the Power Dimension
High level synthesis makes it possible for designers to try many different micro-architectures for each hardware block by varying synthesis constraints without changing the high-level source code. For example, the designer can compare the power consumption of a micro-architecture that uses memories for intermediate storage with one that uses registers. Or, the designer might compare a micro-architecture that uses a fast clock and fewer functional units with one that has a more parallel architecture and more functional units running at a slower clock rate.

Another dimension that has a dramatic impact on power consumption is supply voltage. Using a lower voltage process reduces power consumption, but it also reduces the switching speed of each gate. In order to achieve the needed performance at a lower voltage, a much more parallel circuit structure is often required. Determining whether the required performance can be met at a particular voltage in an RTL design flow requires a complete rewrite of the design. With HLS, the designer can select a low-voltage .lib file, set the clock speed, re-synthesize and have new RTL for simulation and power estimation in a matter of hours.

The designer could compare the power consumption of a micro-architecture that uses a fast clock and fewer functional units with one that has a more parallel architecture and more functional units running at a slower clock rate. The designer might compare the power consumption of a micro-architecture that uses a fast clock and fewer functional units with one that uses a slower clock and more functional units.

An HLS designer can implement and measure the power of dozens of scenarios in the time it would take to try a single micro-architecture writing RTL code by hand.

High-Level Coding Styles to Reduce Power
The way a designer describes functionality at the high level has an impact on the power consumed by the circuit synthesized by the HLS tool. As a result, designers can make changes in their high-level source code to improve power consumption.

Coding to prevent computations from being performed on invalid input data is one example of this principle phenomenon. Many designs operate in a fully pipelined mode where data is read every cycle, a multi-cycle computation is made, and data is written every cycle. An input protocol is used with a signal that shows whether the input data is valid. If not, the circuit needs to drain the pipeline by continuing any computations underway on previous input data and putting out the results.

This can be coded as follows:

while(1)
{
	{ CYN_PROTOCOL("read_inputs");
wait();
valid = valid_in_port.read();
data =data_in_port.read();
}
result = compute( data);
{ CYN_PROTOCOL("write_outputs");
data_out_port.write( result );
valid_out_port.write( valid );
}
}

Without knowing the semantics of the protocol, the HLS tool must compute the result every time through the while loop because it is written to an output port. Depending on the complexity of the function compute(), this may require a large energy expenditure.

Alternatively, the designer can write the high-level code so that no computation is done when the input data is not valid:

while(1)
{
	{ CYN_PROTOCOL("read_inputs");
wait();
valid = valid_in_port.read();
data =data_in_port.read();
		}
if ( !valid )
	continue;
result = compute( data);
{ CYN_PROTOCOL("write_outputs");
data_out_port.write( result );
valid_out_port.write( valid );
	}
}

With this coding style, the HLS tool can construct a state machine that will drain properly without spending power to compute unneeded values.

RTL Coding Styles for Power Optimization
The RTL coding style used can influence downstream RTL power optimizations. Power optimizations such as clock gating can be done as part of the logic synthesis process, but these tools are sensitive to the coding style of the RTL code presented to them. Certain ways of describing multiplexors and registers using assign statements and always blocks in Verilog are detected by RTL tools as candidates for insertion of clock gating logic. Other - sometimes only slightly different - coding styles will cause these RTL tools to miss a clock gating opportunity.

Skilled RTL designers can, and do, learn to use the preferred coding styles, but it is easy for a mistake to occur, especially during the debug process or when modifying a design for reuse. Achieving the best results by hand requires both knowledge of the optimum RTL style for the downstream tools as well as the discipline to apply it consistently. HLS tools can systematically emit RTL code structured to maximize the opportunities for downstream clock gating.

Power Optimizations Made by HLS
Of course, the HLS tool can optimize for power in its own decisions about how to structure FSM and datapath of the design.

One technique used by Forte's Cynthesizer, for example, is to reduce power consumption by performing operand isolation. This ensures that inputs to large combinatorial functions such as adders and multipliers are changed only in states where the output values will be needed. This technique, and others like it, is used by talented designers creating hand-written RTL code, but designers may not want to manually handle all the added complexity in the FSM needed to apply it consistently.

Another power-reducing difference between the circuits typically produced by RTL designers and those created using HLS is the extent to by which large, power-hungry, functional units are shared. An HLS tool can often create a schedule that allows a single adder to implement several different operations in different parts of an algorithm. This reduction in the number of gates required pays off by reducing the power lost through leakage, an increasingly important consideration in designs implemented using small geometry processes.

Conclusion
As this article has shown, high-level designers use a broad range of techniques to improve the overall power profile of their designs. High-level synthesis can provide design teams with greater flexibility to meet their power budgets as well as their schedules.

About the Authors:
Brett Cline
is vice president of marketing and sales for Forte Design Systems. He holds a BSEE from Northeastern University in Boston. Brett can be reached at brett@ForteDS.com.

Mike Meredith is vice President of technical marketing for Forte Design Systems and serves as the president of the Open SystemC Initiative. He is a contributor to two books on ESL methodology and high-level synthesis and holds three US patents in the areas of timing diagrams and timing analysis of electronic circuits. Mike can be reached at mmeredith@ForteDS.com


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Most Popular

Product Parts Search

Enter part number or keyword
PartsSearch


FeedbackForm