[Editor's note: See the end of this article for a helpful list of related articles.]
The ever-growing demand for rich, multimedia signal processing in mobile devices raises a chronic technology challenge. The challenge is to squeeze higher functionality and performance within increasingly tighter power and space constraints. As a result, power-performance metrics are now a central concern in DSP design. New methods have been devised enabling designers to address the main areas of power consumption—namely leakage power, clock trees, logic transitions, and power grids— to significantly improve performance compared to conventional techniques.
In today's CMOS technology, power is consumed in two basic ways: statically and dynamically. Static power is consumed continuously—even during standby operation—through various leakage mechanisms. Dynamic power is consumed only during activity, such as logic and interface operations.
Ideally, static power would be driven to zero. However, techniques that reduce static power consumption tend to increase dynamic power consumption. Thus, chip designers must attain a compromise between static and dynamic power. The ideal static-to-dynamic power ratio depends on the application. To achieve that ideal ratio, a combination of design techniques can be applied to limit the leakage power to a given value. Such techniques include:
- Using more conservative CMOS processes, e.g., opting for a 130nm process over a 90nm process
- Using lower-leakage transistors
- Using circuit techniques that remove power from entire circuit sections, either on a duty cycle basis or when these sections are not in use
Although this may seem counterintuitive at first, to optimize leakage power it is often best to select a higher-leakage process and then to limit the overall leakage by circuit design. For example, a lower-leakage process may use high-threshold transistors (e.g. VT = 0.4 V for HVT), while a higher-leakage process may use lower threshold transistors (e.g. VT = 0.3 V for SVT). The higher-leakage process could draw up to 10 times more leakage current than a low-leakage process, but deliver the same performance because it can use a lower supply voltage.
Let us consider the example shown below (Fig. 1). A design using an HVT process operates with a supply voltage of 0.8 V. Operating at maximum capacity (100% duty cycle), it consumes 5 mW of leakage power and 1 W of dynamic power. A similar design using an SVT process (operating at 0.7 V) delivers exactly the same performance and draws 10 times more leakage power (50 mW). However, the higher-leakage SVT design will only consume a total of 810 mW power: that's a 20% power savings!
1. Power-consumption comparison of HVT and SVT processes.
Power can be even further reduced by not running the SVT circuit at full capacity. The circuit can be powered-down during inactive portions of its duty-cycle thereby completely eliminating leakage current during those periods.
Dynamic power-performance metrics vary based on many factors including the type of processes and algorithms run, the DSP architecture and instruction sets used, as well as the way memory is partitioned. Inside a chip, however, dynamic power is generally consumed by three main processes: clock trees, logic transitions, and power grid losses. So power-performance metrics can be substantially improved by aggressively optimizing each of these three power consumers.
Power Grid Losses — Power grid (IR) losses can easily be diminished to an insignificant value using a substantial power distribution mesh. By analogy, circuit board designs have long used dedicated power plane layers for impedance control, shielding, minimization of and susceptibility to emission (crosstalk), as well as for power distribution.
In smaller-geometry chips (90nm and below), crosstalk has become such a prevalent and difficult issue that some advanced designs shield signal lines by interleaving power and signal traces on each metal layer. Due to the sheer size of the power mesh, power grid losses become insignificant compared to those due to clock tree and logic transition when this method is used.
Clock Tree Losses — Every time a flip-flop is clocked some energy is spent in the flip-flop operation itself and in charging and discharging the (often massive) clock trees that span modern chips. The power consumption in clock trees can be minimized via a combination of increasingly sophisticated techniques, including the use of:
- Individually clock-enabled flip-flops to restrict flip-flop operation to the times when clocking is absolutely necessary.
- Gated clock trees to dynamically prevent clocking entire circuit sections when not in use.
- Multi-cycle path design to reduce the number of flip-flops in circuits as well as the frequency at which they are triggered.
- Asynchronous computational circuitry whenever architecturally feasible. For example, a typical power-hungry DSP sum-of-products operation can be implemented in a cascaded asynchronous circuit (without interspersed flip-flops), rather than in a synchronous feedback circuit. (Traditional synchronous circuits typically have lots of flip-flops which are clocked very frequently.) Like the multi-cycle path, this approach substantially reduces the number of flip-flops used and the frequency at which they are triggered.
- Minimizing the size of flip-flops and the size of circuits to have physically smaller clock trees requiring smaller drive buffers.
- Reducing the voltage level of a clock tree. (This is coupled with a reduction in the logic voltage level.)
The voltage level and size minimizing techniques are discussed further in the next section on logic transition.