90nm processes offer designers the capability to integrate more complex functionality at higher performance on a single chip. However, the smaller geometry gates do not come without a cost, and an increasingly dominant one is power dissipation. This is of great concern to designers of System on Chips (SoCs) for handheld or portable devices where battery life is important, so minimizing power dissipation while achieving satisfactory performance is an increasingly important goal.
Such portable applications are battling against major problems with higher leakage currents at 90 nm process geometries. This places an increasing burden on a performance versus power trade-off. The known design wisdom that prevailed, or more accurately survived, at 0.13 microns no longer holds in the "nano" era. Transistor lengths have become so small that current continues to flow when they are in standby, draining batteries and affecting performance.
The problem is due to subthreshold conduction. When the gate-source voltage, Vgs, of a MOS transistor is less than its voltage threshold, Vt, it is in the subthreshold region. This is characterized by a logarithmic change in drain current with Vgs. Traditionally voltage thresholds have been high enough that with Vgs=0 the drain current is very small. However, with smaller geometry processes like 90nm, reduced power supply voltages have meant reduced Vts are required, and thus the drain current at Vgs=0 becomes significant.
Library developers and specialist vendors have moved to address the issue by offering gates with a higher voltage threshold, but these are slower than traditional gates and have a corresponding effect on speed and performance. It should be possible to design a circuit using faster, low threshold gates where timing is critical and higher threshold gates elsewhere.
The pressure on the design community to bridge the design productivity gap (the gap between available silicon real estate in manufacturing terms versus the capacity to fill it in design terms) has never been more pronounced. For digital circuitry, designers are demanding efficient means to optimize for speed and power, characterized by concurrent design with high and low Vt cell libraries. The current RTL-to-GDSII design flows were not constructed for this purpose, and by implication are not optimized to address it. This review will outline the requirements and challenges of the design community and describe an emerging solution which will help them address this problem.
Defining the problem
Power dissipation in a circuit comes in two forms: dynamic and static. Dynamic power is primarily caused by current flow from the charging and discharging of parasitic capacitances. Dynamic power is proportional to these capacitances, the clocking frequency and the supply voltage. Design techniques can be used to reduce dynamic power by reducing the overall average activity, for example by clock gating, while process techniques such as low-k dielectrics and diagonal routing can help reduce parasitic capacitance.
Static power, on the other hand, is caused by leakage currents while gates are idle. In the early 1980's CMOS took over from depletion mode NMOS as the preferred technology for MOS ICs, since the CMOS gate had negligible static power consumption compared to the NMOS depletion mode gate which drew current when the output was a logic 0 due to conduction in the depletion load. The CMOS gate consumed no static power because in both states one of the transistors was "off," thus preventing static power dissipation.
However, with decreasing device sizes this maxim no longer holds true. To see why, we need to understand the device physics. When the gate-source voltage Vgs is below the threshold voltage Vt, the transistor is in what is known as subthreshold mode. This is characterized by a logarithmic variation of drain current Ids with Vgs. The change in Vgs known as the subthreshold swing needed to reduce the drain current Ids by one decade is given by:
S α nkT/q ln(10)
Typically the subthreshold swing S is in the region of 80mV per decade. So in order to fully turn devices off, it is desirable to have Vt as high as possible such that at Vgs = 0, the subthreshold leakage current, is many orders of magnitude down.
However, the delay of a gate is dependent on its Vt. Typically gate delay can be represented by:
td α Vdd / (Vdd -- Vt)a
where "a" is dependent on the process. So for any given supply voltage higher Vts give rise to slower gate delay. Traditionally device Vts were set to give acceptably low leakage without compromising performance.
With short channel devices, the laws of scaling dictate that gate oxide thickness is reduced and supply voltages drop; for example a 90nm process might have 16... of gate oxide and a 1.0v supply voltage. At this geometry it is no longer possible to choose a single Vt value that will give an optimum leakage current and gate delay. Instead foundries offer both low Vt transistors that are fast, but have high leakage, and high Vt transistors that are slower but have reduced leakage. The choice is made by a masking step in the process, so library cells for high and low Vt gates are physically the same size, an important point for subsequent design optimization.
In a typical 90nm process, standard Vt devices have subthreshold leakage currents of the order of 10nA/um for standard Vt devices and 1nA/um for high Vt devices. Clearly there is a huge power saving to be gained from using high Vt devices. Foundries and library vendors have developed gates with two or more values of Vt to address the leakage problem and reduce power consumption. However, in a real design there are timing critical paths and non-critical paths. What is needed is a means of using these libraries effectively without compromising performance or placing excessive new burdens as they balance circuit performance against power consumption.
Power management in nanometer designs
RTL synthesis tools use a variety of complex calculations to generate a logical gate level circuit based on a high level description of the design. These use libraries that contain logical descriptions of the gates available, as well as timing and delay information from inputs to output. As manufacturers have made smaller processes available, these tools have been modified to take physical information such as routing parasitics in to account. Then physical placement of gates was introduced to the synthesis flow to give a reasonable estimate of delays throughout the circuit.
Addressing gate leakage has presented new challenges to design tools, and various options from mixed threshold gates to multiple supply rails and dynamic circuit power-down have been put forward as possible solutions. However, these present extra levels of complication in the synthesis and layout process and require fundamental changes in design style.
Traditional synthesis, place and route flows do not offer the means to optimize for multiple Vt libraries simultaneously; synthesis expects to choose gates from a single target library. Therefore, the ideal solution to the problem must compliment and enhance current design methodologies without causing disruption in the general EDA scheme.
In many circuits, it is possible to identify timing critical paths and have these maintained with fast, low Vt cells while others can use high Vt cells to reduce leakage current. However, having an engineer identify all paths where low power cells can be used is impossible given the complexity of even a modest design. In addition, some paths through a circuit can use a mixture of high and low Vt gates and still maintain timing performance so a more detailed examination of the circuit's components must be carried out.
Achieving low power in nanometer designs must go beyond engineers' critical path definition to an automated means of identifying and resolving leakage issues. In most cases, cell Vt selection will depend on physical placement and routing information as well as the work rate of a particular net. In addition, power optimization should not compromise circuit performance by over-constraining the synthesis process.
Implementing low power gates at the physical level
Figure 1 shows how a dedicated tool, such as In2Fab's ISIS tool, works with an efficient methodology that complements a standard RTL to GDSII flow. It is applied once the design has been place and routed and refined through to timing closure.
Figure 1 -- Overall flow from RTL to high and low Vt optimized design
A design may be synthesized using a low Vt library to ensure the fulfillment of timing and performance goals. This allows the designer to take full advantage of the speed offered by the nanometer process. This circuit is processed through place and route to generate a layout. When this is complete, power consumption in all elements of the design can be analyzed and the circuit updated to use gates of a higher voltage threshold wherever possible.
The swapping process takes information from the libraries of each threshold, and incorporates delay information from routing in the physical layout. This ensures an accurate analysis of the final circuit, and the design is updated without the need for iterations or re-synthesis steps to re-check timing once power reduction is complete. The objective is to replace high leakage gates with low leakage ones wherever possible, with the constraint that timing goals must be met.
First, the libraries are analyzed for logical matches of gates to determine which gates in each library have identical functionality (NAND to NAND, NOR to NOR), as shown in Figure 2. Once the equivalent functions have been determined, detailed analysis of each gate placement can be undertaken.
Figure 2 -- Specific methodology for concurrent high and low Vt optimization
Static timing analysis is performed on each instance of a gate using the model contained in the library and the wire loading information from the layout. As the delay and power of the logically equivalent gates are known, each instance can be tested to see if an equivalent gate of lower power can be used without compromising the circuit's timing. Unlike physical synthesis tools, real wiring parasitics are used for delay calculation instead of estimates ensuring the highest level of accuracy in the power reduction process.
As each replacement gate is identified, the circuit's netlist and physical layout are updated with the new low power cells. Because the high and low Vt cells have the same topology, no routing changes are required when replacing gates in the layout and the process is completed in a single pass, another substantial added benefit of adopting this methodology.
For greater flexibility, within the flow and tool, gates can also be resized that is, increased or decreased in drive strength to further optimize the design. Gate size changes may result in minor adjustments in the rows and routing, but the performance gains using these techniques can be recognized and captured without the need to re-synthesize.
Further timing analysis and gate optimization may also be applied to any net in the design, dynamically adjusting drive strengths against physical net loading to improve performance. This can greatly improve a circuit's efficiency with particular benefits for clock balance and skew, and is especially useful where multiple clocks have been employed and the synchronization of them is key.
The reduction in power of this approach can be substantial. A recent article described an ARM core implemented in a 90nm process. With a low Vt library, the device ran at 360MHz but dissipated 21.6mW. Switching to a high Vt library reduced power to 1.3mW, but the device only ran at 200MHz. Using a combination of gates from both the low and high Vt libraries the speed was maintained at 360MHz but with just 9.7mW power consumption.
In summary, static leakage current at the 90 nanometer process node is design enemy number one. It is an increasing concern as the developers of SoC applications seek to gain the performance and functionality benefits that this geometry can offer without paying the penalty in power consumption. The methodology described in this review is specifically dedicated to the power versus performance optimization of a design using both high and low Vt libraries concurrently. It enables significant power reduction whilst at the same time maximizing design performance. Adopting such a new methodology for nano-class process technology will be essential to achieving both design and consequentially business success.
Keith Sabine is Vice President of Engineering at In2Fab Technology. Sabine's background is in bipolar integrated circuit (IC) design, and he has worked with such companies as Fairchild Semiconductor, Plessey Semiconductors and BP Research. For more than a decade Keith has worked in the EDA sector with Cadence Design Systems and Simplex Solutions.