Designers today continue to be challenged with the need to manage power, timing and signal integrity concurrently throughout the design flow. Traditional power optimization techniques and today's power-aware design flows are proving insufficient in the design of systems-on-a-chip (SoCs) for next-generation applications, and must evolve to enable design for energy efficiency. This tutorial focuses on the need for design flows that enable energy efficient design through existing and emerging techniques, such as dynamic and adaptive voltage and frequency scaling.
The market for portable electronic devices has grown at a phenomenal rate over the last few years. From mobile phones to handheld games, PDAs and digital cameras, the portable market continues to build on the future promise of smaller, lightweight devices with even more higher-performance features and increased battery life leaving today's system-on-chip (SoC) designers faced with the daunting task of delivering on this promise.
Recent improvements in battery and process technology have been aimed at meeting the increased energy demands of the up-and-coming portable systems. However, it may be years before next-generation battery technologies, such as fuel cells, become commercially viable. With the latest nanometer CMOS process technologies which are crucial to the design of the ever-smaller, feature-rich portables comes an increased presence of phenomena such as leakage current. That's a static power effect that acts as a constant drain on the battery, reduces the battery life, and can contribute to upwards of 8mA per 1M transistors in a 0.13 micron technology. Needless to say, energy efficient design flows of the future must address these new process challenges if the portable market is to continue to grow.
In this article, we explore ways to increase energy efficiency in SoC designs, including methods and techniques for controlling supply voltages and reducing the sub-micron leakage effect. We also look at ways to optimize the standard EDA flow for energy efficient design in order to maximize the benefits of these new design techniques.
Designing for Energy Efficiency
Forms of energy use
Energy usage in synchronous CMOS designs can be broken into two parts. The first part is denoted as the dynamic (or switching) power, which can be expressed as:
Pdyn = aF * C * V
where aF represents the switching activity of the circuit, C represents the total capacitance, and V represents the switching voltage range. The switching activity is a de-rated value of the clock frequency, as the majority of the nodes in a design do not switch at the same rate as the clock. When a design with multiple frequencies (or voltages) is analyzed, this equation must be separately computed for each unique value, and the results totaled to gain a value for the entire design. It is important to note that voltage contributes at the square of its value, resulting in a large contribution to the dynamic power.
The second part of the power usage is the leakage power. The leakage power has multiple contributors. However, a major component of leakage power is the sub-threshold current. This current can be calculated as:
Isub = I0(e[-Vth/S] [1-e-qVds/kT]) (at Vgs = 0)
The key element in this equation is the relationship of leakage current to the threshold voltage. The reduction in Vth has increased device performance, but at the expense of leakage current. Techniques already exist that identify individual instances where a design may need the increased device performance, and for other instances a device with a higher Vth value is used. However, this is a static determination, and does not account for possible variations in performance requirements.
Most often, today's designs utilize a single supply voltage for a region of logic, which establishes a level of performance possible within that region. This performance is guaranteed across variations in silicon processing, temperature, and delivery of supply voltage, all of which contribute to modifying performance.
However, a novel approach to energy efficient designs is to realize that performance levels can be variable. Whereas some tasks may require the maximum performance of the device, there are generally many other tasks that can be performed at a lower performance level and still meet the requirements of the system. The identification of these lower-performance tasks is generally done through system software, and does not impact the design flow for traditional SoC designs.
Reducing the performance of the system (frequency) can provide immediate gains in energy efficiency. However, with the reduced performance come additional opportunities to reduce power. The supply voltage, which has traditionally been considered a static value, can be modified when reduced performance requirements are identified. There are two basic methods to accomplish this, which are discussed below.
Dynamic Voltage Scaling (DVS) has been used for a few products currently available on the market. The premise of DVS is to identify minimal supply voltages that can be delivered for each of the performance modes. This identification is generally done first through simulation, then through extensive characterization of the SoC. The need for this characterization is to identify a minimal voltage that still can guarantee performance across the possible process and temperature variations.
Special supply regulation circuits are required to be able to deliver a multiple set of voltages to the SoC. And communications must be maintained between the regulation circuits and the SoC to ensure the proper voltage is selected for each performance mode. Some SoCs integrate this regulation function, but generally this limits the range and efficiency of the solution.
Adaptive Voltage Scaling (AVS) is a technique developed at National Semiconductor. Available under the trade name PowerWise, this technology provides additional energy efficiency by eliminating the need to pre-identify voltage levels for the various performance modes.
The PowerWise technology includes a monitor that is embedded into the SoC. This monitor is capable of determining the proper voltage required for a specific performance mode at a specific time, taking process and temperature into account because the monitor is on-silicon. Data from these monitors is analyzed and information is sent out to a compatible power-regulation circuit via a dedicated interface (PWI). This closed-loop approach allows for the lowest power supply voltage possible, providing the best in energy efficiency.
The chart below shows the results when using the three approaches listed above -- performance modulation (frequency scaling), performance modulation with Dynamic Voltage Scaling, and performance modulation with Adaptive Voltage Scaling.
Figure 1 -- Fixed vs. dynamic vs. adaptive voltage scaling
The Fixed Voltage (FV) used was 1.2V. The Dynamic Voltage (DV) results shown used two voltages, 1.2V and 0.9V. The Adaptive Voltage (AV) was modulated from 1.2V down to 0.7V, shown for three different process/temperature conditions.
This data shows the benefits of DVS when the application can step-down the voltage due to reduced performance needs. Using AVS provides benefits at all performance modes, not just the lower-performance modes, because the AVS technique eliminates the design margin for process variation and temperature.
Given the proper process technology, the PowerWise technology can also reduce the leakage power. When a triple-well CMOS technology is used, separate voltage lines can be used to connect to the NMOS and PMOS bulk regions. Then, these lines can be modulated to provide a back-bias voltage for regions that are not operating, or operating at a very low rate, where leakage power becomes a significant problem compared to dynamic power. This back-bias voltage causes an increase in device Vth, reducing the sub-threshold leakage.
Real world application example
An example of a system that uses multiple variable voltage regions can be seen in the following figure:
Figure 2 -- AVS/TS imaging system
This system is destined for tape out to a 0.13 micron triple-well CMOS process. In this system, there are two separate processor regions. Each processor region is capable of independent tasking and performance requirements. The interface between the two domains is handled by an Inter-Core Communications Unit (ICCU). In addition, each processor region has a dedicated peripheral cluster. An intelligent clock management unit connects to all regions.
There are two AVS domains and four Threshold Scaling (TS) domains in this system. Because each processor region has separate performance requirements, they are partitioned as separated AVS and TS regions (AVS1/TS1, AVS2/TS2). In addition, for periods when the peripheral clusters are not in use, they are partitioned as TS domains (TS3, TS4) to allow for reduction of static power dissipation. As a result, this design has a total of three core Vdd supplies (two variable, one static) and eight back-bias supplies (four regions, two per region). While this may seem excessive, this is necessary to ensure an energy efficient design is developed.
A design flow must be made available that can support the introduction of multiple voltage regions, variable voltages per region, and back-bias control of regions, and do so while providing the capability to optimize for power, timing, area, and reliability.
EDA Support for Energy Efficient Design
For new energy efficient design flows to be viable, the implementation and analysis tools require accurate power modeling and library support. The effects of changes in supply and bias voltages on timing and power must be characterized and available in an efficient representation. To adequately capture this information, new formats are needed.
It was once possible to create tables based on a few characterization points and use k-factors to extrapolate to uncharacterized points in between. That methodology is now failing to provide the accuracy required. More sophisticated techniques are now needed to enable energy efficient design.
Figure 3 -- A flow for energy-efficient design
Scaleable Polynomial Models (SPMs) offer a way to capture library characterization information in an accurate equation-based format. Each of the voltages used for supply and back-biasing become a variable in an equation, which is stored with the cell's information in the library. This equation-based format allows the tools to obtain precise data on the timing and power characteristics of a cell across a broad operating range of conditions.
The cells are characterized for timing using Scaleable Polynomial Delay Models (SPDMs), for power using Scaleable Polynomial Power Models (SPPMs), and for leakage using Scaleable Polynomial Leakage Models (SPLMs). The primary advantages of using the equation-based format are that it is concise, and it provides the required accuracy for performing analysis and optimization.
Equipped with the new libraries, it is now possible for the design automation tools to perform optimizations based on power, timing and area. The designer can also analyze the conditions on the chip and back-annotate this information into the implementation tools to close on the design constraints. Since it is important to ensure the chip functions correctly across the expected operating ranges of process, voltage, and temperature, the tools must also be able to use this information on an instance basis.
A major impact on the design flow is the need to treat the supply line as another variable. For most previous mainstream designs, logical netlists have only had to specify the input and output connections between gates. Vdd and Vss were constants and the Vdd and Vss pins for the cells all attached to the same net. Designers are now beginning to use more voltages for the logic and memory portions of the chip as well as being able to selectively turn blocks ON and OFF and dynamically vary the voltages supplied to those blocks.
Figure 4 -- Handling multiple voltage supplies from multiple cells
The tools have to manage cells that have more than one supply rail and circuitry that can vary or completely shut down the supply voltage to a block. Communication between blocks operating at different voltages requires the insertion of voltage level shifters to transform signals to the appropriate levels. Clock tree generators need to account for buffers that operate at different voltages to provide clock signals to each block and the router needs to account for buffer placement in the context of different voltage regions on the chip. Routing a pass through signal through a region may now require the insertion of level shifters in order to adequately drive the signal.
For designs that are using variable back-biasing, there are two new terminals for each cell that need to be routed. A common ASIC design practice is to create cells that tie the N-well regions to Vdd and the P-well regions to ground. In the physical implementation, these are simply predefined contacts designed into the cell that are connected as part of the power and ground routes. To enable back-biasing, new voltage lines are routed to control the bias. These can be to individual cells, or more likely, to regions that contain multiple cells sharing the same well and a common tie-cell to control the well bias.
Analysis tools need to understand these different situations tracking all of these new voltage based modes and provide useful feedback to the designer.
These design techniques create interesting dynamics on the chip. Turning the voltage ON or OFF to a block can cause large transients on the power grid, affecting many other blocks on the chip.
Figure 5 -- Signal integrity impact with multiple voltages
A design implication that design and analysis tools must account for is the impact of driving some lines at higher voltages than others. The higher voltage lines can cause larger spikes in neighboring low-voltage lines than other lower-voltage aggressors, which impacts timing analysis, power and the routing of lines on the chip.
Implementation tools need to understand the effects of the different operating conditions. Instead of designing for the typical, Min and Max conditions, the circuitry now has to deal with a much broader range of conditions. Complicating matters, as the voltages have been decreasing, the current on chip has also been increasing, thus raising the sensitivity to IR drop and L di/dt effects.
As we enter the era of 90 nanometer design and below, meeting power objectives is becoming as important as meeting performance targets. Designers today must manage power, timing and signal integrity concurrently throughout the design flow to deliver energy-efficient products right to market.
The energy efficient design techniques and companion EDA flows described in this paper are being deployed on portable applications today. These techniques and flows will become mainstream in the near future, enabling SoC designers to continue to deliver on their promise building great products!
David Tamura is the Design Technology Manager for National Semiconductor's Technology and Infrastructure Group, and has previously worked within National Semiconductor in Advanced Design Methodology and DFT Development Management.
Barry Pangrle is the Senior R&D Manager for ASIC Power Products at Synopsys, and has previously held a number of R&D management roles at Synopsys in the areas of High-Level Synthesis, Test and Power. Prior to Synopsys, Mr. Pangrle was the Director of Methodology & Automation at Clearwater Networks and was an Assistant Professor in the Computer Science and Engineering Department at the Pennsylvania State University.
Rajiv Maheshwary is the director of marketing for Static Timing and Power products at Synopsys, and previously held the position of group marketing manager for the Synopsys' static timing analysis tool, PrimeTime.