Statistical static timing analysis (SSTA) offers a number of advantages over traditional corner based static timing analysis. Most notably, it provides a more realistic estimation of timing relative to actual silicon performance. Armed with a better answer, designers can focus their optimization efforts on the timing paths that have the biggest impact on overall performance and yield rather than paths that may fail only at extreme corners.
Meeting timing at the worst or best-case corner can be very challenging, lengthening design schedules and negatively impacting power consumption. With a large range of potential delay values, often with a difference of as much as 50 percent or more between the slow and fast process corners, it becomes harder to meet both setup times at the worst-case corner and hold times at the best-case corner.
Even if the performance goals are met, there is often an undesired impact on other design metrics, in particular power consumption, noise immunity and leakage. For example, to meet an aggressive performance target, optimization may deploy a higher ratio of low threshold cells that are faster but leakier. With statistical analysis, a better tradeoff between timing and other design metrics such as power, noise immunity can be achieved.
SSTA provides not only a list of worst timing paths, but also the probability of those paths failing while accounting for the impact of process variation. To accurately predict variation, it needs to account for both systematic variations (one example would be due to lithography) and random variations (one example would be due to doping).
Even within the same chip there is a wide range of on-chip variation (OCV). Traditional static timing tools use guard-bands or OCV factors (often as much as +/- 15%) to safeguard against OCV, but this type of over-design is too wasteful to be effective at 65nm or below. This is because at the smaller geometries the process is more sensitive to variation and over-design has an even bigger impact on leakage power. Furthermore, many of the advantages of using a costlier process node may be eliminated if too many cells are needlessly over-sized to meet an unrealistic OCV target.
SSTA can help provide a much more realistic measurement of OCV that reduces the overall guard-band but also protects against corner cases where a single large OCV factor is not sufficient to catch potential errors. This is because each transistor in the design will have a unique sensitivity to each source of process variation depending on its size, location, orientation, interconnect loading and how it is driven.
Shown in figure 1 are the delay cumulative distribution functions (CDFs) for an inverter when subjected to three different input slews and three distinct output loads, nine cases total. The curves were calculated by Monte Carlo simulations varying the transistor width (XW), length (XL), threshold voltage (Vth) and oxide thickness (TOX).
From the slope of these curves, it is clear that the delay sensitivity to variation changes with the environment that a given instance of a cell is associated with. In addition, OCV de-rating is often applied without consideration of the path length where for longer paths the probability of accumulation of random variation is much smaller than for short paths. Consequently, a single one-size-fits-all OCV factor approach is both costly and not very effective.
Figure 1 Cumulative distribution functions of delay vs. skew and load
While variation can impact both cells and interconnects, it has a larger overall impact on cell performance. Critical to effective SSTA are the statistical cell models that SSTA is based on. These models can be created by pre-characterizing each cell under different loading and slew conditions while accounting for a range of process parameter variations based on actual process data and measurements.
Statistical timing models
SSTA requires a cell model that accounts for process parameter variations. Typically the important process parameters for SSTA are transistor channel length (L) and threshold voltage (Vth), though in general any process parameter may be modeled for variation. Each process parameter may vary either globally, systematically or randomly. Global variation, systematic variation and random variation have monotonically decreasing characteristic distance, or effective range of the variation impact.
Global variation is usually associated with different process steps in the chip manufacturing process. Each of the global variation parameters, such as L, W, Vth and gate oxide (TOX), impacts a wide region, perhaps over an entire chip or even across multiple chips on a wafer, but its impact is rather uniform across the region.
However, different global parameters may not be correlated in their variation. For example, lower dopant concentration (which affects Vth) and thicker gate oxide (TOX) than nominal may occur simultaneously across a wafer, but their impact on timing is the opposite. Each delay, transition, setup/hold, and pin capacitance table entry needs to be characterized for each global variation parameter.
The total timing sensitivity depends on the correlation between the global parameters, as well as between NMOS and PMOS transistors which may vary independently. For every library data value, multiple simulations are required to characterize global timing sensitivity, and non-linearity in the variation impact on timing must be considered as well.
In systematic variation, the process varies in the same direction by the same magnitude for every transistor within a small region, for example, within a logic cell. A classic example is the variation in channel length due to focus variation. The library modeling for systematic variation is similar to that of an 'intermediate' process corner.
Similar to global variation, each delay, transition, setup/hold, and pin capacitance table entry needs to be characterized for each systematic parameter, and represented in the library either as additional tables or sensitivities. The increase in characterization effort depends directly on the number of systematic parameters being modeled.
Finally, random variation models the process variations that apply to each transistor independently. Even for neighboring transistors inside the same cell, the variation may be different in direction and magnitude. In the analog world, this phenomenon is known as 'mismatch'. The magnitude of random variation is inversely proportional to the square root of the transistor area according to a relationship known as the Pelgrom's formula.
Therefore, cell characterization must make the appropriate parameter adjustment that is unique to each transistor before performing Spice simulation. As device geometries reduce, small imperfections in chip manufacturing result in increasing impact of these random variations. At 90nm, random variation accounts for up to 45% of the total process variation, and is projected to increase as scaling continues.
These process parameter variations pose new and significant challenges to cell characterization, where the characterization inputs are no longer consisting of fixed values. Consequently, the characterization outputs are standard deviations of the total impact due to these variations over each and every transistor inside a cell. Needless to say, a huge number of simulations are required to generate the standard deviation for each of the millions of table entries inside a modern cell library.
Circuit analysis drives statistical characterization
Figure 2 shows a simple buffer circuit with random threshold voltage variations to illustrate the characterization of delay and pin capacitance standard deviations. The variation in threshold voltage for each of the four transistors is represented by Vthn1, Vthn2, Vthp1 and Vthp2. Table 1 shows simulation results of the rising delay and pin capacitance variations due to each of these parameter variations.
In this example, the delay variation with a small output load is dominated by the first stage NMOS variation, whereas for a larger output load, the last stage PMOS variation has an increased delay impact. For pin capacitance, however, the first stage transistors have the most influence on the total variation. As one can see, the variation of each library value has its own unique composition of the aggregate impact from each transistor's variation on the data that is being characterized.
Figure 2 Simple buffer with random threshold voltage variations
Table 1 Random threshold voltage variation for a buffer and its effect on delay and pin capacitance
For each input slew, output load, and switching state combinations, the variation due to each transistor must be simulated, and the results combined to model the total variation impact due to all transistors inside the cell. This assumes the total impact due to all transistor variations can be approximated by the sum of the impact due to each transistor variation. The accuracy of this assumption is shown in Figure 3.
Even with this simplifying assumption, the amount of computation required for the characterization of random variation is increased by a multiple of (M*P) over a regular timing library, where M is the number of transistors inside a cell, and P is the number of random process parameters being modeled. For example, a simple 24 transistor D flip-flop with four random variation parameters (such as Vth and L for PMOS and NMOS) requires almost 100-fold increase in characterization time.
Furthermore, certain parameter variations are becoming more non-linear, requiring even more simulations to model variation effects accurately. This is clearly impractical and represents a major hurdle in the adoption of statistical static timing analysis and statistical timing driven optimization. New characterization methods are needed to overcome this barrier.
Statistical characterization requirements
It is apparent from the above example that statistical characterization must be performed in a way that the transistor level details of each cell are exposed to the characterization tool, rather than treating each cell as a black box, which has been common practice in library characterization up to now. Also, due to the drastic increase in computation requirements, a significantly faster characterization methodology is needed.
For statistical timing analysis to realize its potential, the turnaround time for statistical characterization must not be too much longer than today's corner-based characterization. To meet this target, a new characterization approach is needed that is orders of magnitude faster than current approaches.
To achieve fast characterization times, a number of speedup techniques can be employed, most of which require detail analysis of the transistor level circuit inside a logic cell. For example, linearity in parameter variations can be determined by analyzing a circuit's response to variation of each parameter.
Accuracy in traditional timing libraries has always been verified by comparing against a golden Spice simulator using the appropriate accuracy settings. For statistical libraries, accuracy must be established by a Monte Carlo reference. In other words, the accuracy of a statistical characterization system is heavily determined by the accuracy of the transistor level modeling of process variations.
Figure 3 shows an accuracy plot of delay variation due to random channel length variations for a 90nm cell library when linearity is assumed. The overall accuracy is within about 3% of Monte Carlo simulations. However, process parameters that exhibit significant non-linearity must be modeled with non-linear techniques or else accuracy will suffer.
Figure 4 Accuracy of linear variation approximation to Monte Carlo reference
A robust statistical characterization system must include self-validation mechanisms to ensure the standard deviation values are within an acceptable accuracy threshold. For selected cells where accuracy is most critical, such as clock buffers, the system should also support direct characterization with Monte Carlo methods for highest accuracy.
Last but not least, the consistency between traditional library and statistical library cannot be overlooked. If regular static timing analysis uses a library characterized with a certain set of assumptions, but the statistical timing analysis uses another library characterized with a different set of assumptions, the path delays between them cannot be effectively compared.
Characterization assumptions, such as state dependency, input waveform, Spice simulation settings, and setup/hold criteria have a significant impact on results. These characterization assumptions must be fully understood and aligned before any meaningful comparison can be made between static and statistical timing analysis results.
Statistical static timing analysis offers designers some respite from the difficult timing closure problem and is essential technology for cost-effective use of 65nm and 45nm processes. However, the benefits of SSTA cannot be fully realized without efficient, accurate and highly automated statistical library characterization. New characterization methods such as those described in this article are required to enable the creation of accurate statistical cell libraries with similar turnaround time requirements as today's corner-based library characterization.
These new methods will require much more intimate knowledge of the circuit being characterized in order to make the tradeoffs necessary to provide both accuracy and performance. In addition, new ways of verifying the libraries will also be needed to ensure the characterization assumptions are valid. When coupled with efficient implementation of statistical timing analysis that leverages these statistical cell models, the promise of statistical design implementation moves one big step closer to reality.
Ken Tseng is chief technical officer at Altos Design Automation, Inc. Kelvin Le is member of technical staff at Extreme DA.