United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 
     

cover story

Using Clock Skew as a Tool to Achieve Optimal Timing

High-performance designs may call for a nonzero clock skew for robust clocking schemes. Employing nonzero skew can help increase performance as well as extend a chip's safety margins.

by Joe G. Xi and David Staepelaere



Today, 0.25-µm design is leaving the domain of a few high-end chips and entering the ASIC and IC design mainstream. Though a large number of design starts will use very deep submicron (VDSM) processes, today's tools and methodologies still haven't answered the challenges posed by the recent process technologies. In particular, the issues of interconnect delay, signal integrity, IR drop, and electromigration are proving thornier than anticipated. To avoid those problems, designers have traditionally relied on conservative approaches, like overdesigning. But such approaches no longer work; designers face too many competing constraints and lack the room to maneuver. One of the critical design steps--the clock tree design--best illustrates those difficulties.

Few people will question that the clock is the single most important signal in a timing-critical chip. To meet timing specifications, the clock signal must operate at a particular frequency and arrive at various parts of the chip at a specified time. A clock distribution network--the clock tree--consists of interconnects and buffers and typically connects the clock source to latches, flip-flops, and other synchronizing circuits on the chip. The major task of clock design is to synthesize a clock tree that meets the clock skew, insertion delay, and frequency targets. That task alone is difficult enough, considering that the clock signals may operate at 200 MHz or more and that designs are becoming larger and larger. At 0.25 µm, the portion of the total path delay due to the interconnect has significantly increased, especially for such global signals as the clock.

Clock skew, the variations in a clock signal's arrival time at different clock pins, becomes a critical timing parameter. Excessive skew can cause either race conditions or cycle time violations. In the past, designers have minimized skew by way of a number of approaches, such as H-tree, balanced routing (zero-skew routing), and clock mesh. Some clock synthesis tools based on those approaches have worked reasonably well. But at 0.25 µm, a number of issues have made clock design--and the achievement of zero skew--much more difficult.

We've developed an entirely different approach to solving the problem of clock skew--in fact, we now see clock skew not as a problem, but as a tool that can help engineers to achieve the optimal timing on their designs. Employing nonzero skew can help increase the performance of a chip as well as extend the safety margins that prevent race conditions or cycle time variations. In short, clock skew has become a variable that we can control to meet timing requirements or achieve a more robust design.

Getting the drop on IR drop

As the widths of wires shrink, resistance increases more rapidly than capacitance decreases. Meanwhile, the increasing routing density and number of interconnect layers, as well as a large height-to-width ratio (2:1 and 2.5:1) make coupling capacitance much larger than capacitance to ground. Both interconnect and gate delay must take into account the crosstalk between signals. Furthermore, crosstalk is by nature a dynamic phenomenon. The delay caused by coupling capacitance depends on the switching directions and switching intervals of the coupled signals. The delay models and analysis methods that current clock synthesis tools use produce results that range far from predictions. In addition to causing delay variations, crosstalk can also cause deviations of the clock signal waveform.

Table 1 Permissible ranges for skew

For the sequentially adjacent registers Ri and Rj, the minimum path delay and the hold time at Rj constrain the negative skew (a). Similarly, the maximum path delay and the setup time for Rj limit the amount of positive skew (b).

Another serious issue arising at 0.25 µm is IR drop, which may cause the supply voltages to fluctuate. IR drop has always needed some attention from the designer, but until recently it has been a relatively easy problem to fix. Power bus planning and net widening can address the average IR drop, which depends on the average current drawn from the power bus. A more difficult issue is transient behavior, such as peak IR drop, which--like crosstalk--is dynamic. The peak current drawn by a particular device depends on the timing and the vectors exercised by the chip. If not designed properly, the supply voltage can fluctuate in an almost random fashion. As a result, the delay and integrity of the signals, including clocks, become difficult to predict. Of course, other process variations and environmental conditions complicate the matter even further. All those phenomena make the clock signal look less and less like the ideal. Thus if a signal integrity issue ever arises, the clock is the first place to look.

One obvious solution to the clock skew problem is to overdesign. In fact, conservative design methods have become common practices. Designers can always use very wide wires, mesh, a large number of buffers, and even an extra layer of metal to carry the clock signal or shield it. However, such approaches extract a price--one that, at smaller geometries, may be too high. The increased wiring means increased capacitance and subsequent power dissipation. Carrying the largest amount of load and toggling more frequently than other signals in a chip, the clock happens to be a major source of power dissipation. For example, the first DEC Alpha chip used a clock driver that carried a 3,250-pF capacitance load. When the chip operates at 200 MHz with a 3.3-V supply, the dynamic power dissipated by the clock alone is 7.08 W, which amounts to 30 percent of the chip's total power dissipation. System-on-a-chip design simply can't afford to allow one signal to consume so much power.

Nor is power the only price. Even if power is no object, and assuming that the designers have done such a good job designing the clock that they have realized exact zero skew under all circumstances, they face yet other stumbling blocks: simultaneous switching noise and peak current. If everything is switching at exactly the same time, then the large amount of current drawn from the power supply at that instant is so large that even with a very small lead inductance, the dI/dt switching noise can inadvertently cause the chip to fail. The large peak current can also harm the chip--especially when the chip contains sensitive circuits, such as analog functions. Of course, designers may add decoupling capacitors, but only by adding to silicon and manufacturing costs. Clearly, those challenges are beyond the capabilities of current clock design solutions.

Use that skew

Whenever we look for new design solutions, we also need to look at the bigger picture: What is the problem we're trying to solve in the first place? Is minimizing clock skew the goal of clock design? No, the goal of clock design is to meet the chip's timing requirements. Since logic path delays also determine the timing of a chip, minimizing skew is just one approach to meeting the timing specifications. If we consider skew in the context of chip timing, we would find a larger, multidimensional space in which to search for solutions. Designers can consider skew--much like logic functions, gate sizes, and supply voltages--as a design variable. In that sense, what we're really trying to achieve is not just a good clock design, but rather a closure of timing or a more robust timing of the chip.

Current clock design methodology depends on the assumption that a smaller skew means a better design: A chip with zero or minimal skew either performs better or offers more robust timing. Moreover, most current clock design tools are concerned with buffer insertion or clock routing to minimize skew. However, that assumption is purely artificial, mainly because other design stages--synthesis, timing-driven placement and routing, and timing verification, to name a few--all assume an ideal clock, one with zero or negligible skew. This functional assumption allows designers to separate clock design from the logic and physical design of the gates.

Figure 1 Useful skew vs. zero skew

Some nonzero skews can become useful by creating safety margins to guard against delay variations. Consider a simple circuit with three registers and two local datapaths (a). The traditional zero-skew methodology leaves skews dangerously close to the edges of the permissible ranges (b). The useful skew methodology moves the skews toward the center of the permissible ranges, creating safety margins against potential timing violations (c).

A closer look at the relationship between skew and logic path delays, however, reveals that zero skew may not be necessary nor the best bet for robust timing. Nonzero skew can be either positive or negative with respect to the direction of a logic path from one register to another. For two registers, i and j, that are adjacent through combinatorial logic the skew between them will is:

skew i to j = arrival at i - arrival at j

Excessive skew in either case may produce race conditions or cycle time violations. The clock cycle time, the register's setup and hold time, and the longest and shortest combinatorial logic delays define a permissible range in which clock operates correctly (see Figure 1).

To avoid a race condition, the negative skew must satisfy:

skew i to j min(delay) + hold

To avoid a cycle time violation, the positive skew must satisfy:

skew i to j , period - max(delay) - setup

Since combinatorial logic path delays vary from one to another, the permissible ranges may vary among pairs of registers. Such a condition suggests that the zero-skew methodology may have used a local constraint as a global one, which can be too restricting. Furthermore, depending on where skew falls within the permissible range, the timing implications differ greatly (see Figure 2). In the case of zero skew (and ignoring setup and hold time), the circuit operates correctly with a clock period of 9 ns-- but leaves a margin of only 0.5 ns from cycle time violations. Should some coupling or IR drop effects cause skew to vary by slightly more than 0.5 ns, the circuit no longer operates correctly. Similarly, only 1 ns lies between FF1 and FF2 to prevent race conditions.

Meanwhile, if we move the skew within the permissible range to nonzero, we may achieve either of the following benefits (see Figure 2): If we reduce the clock period to 7 ns, the circuit still operates correctly but runs at a faster clock frequency, improving the performance of the circuit. Alternatively, keeping the cycle time at 9 ns leaves a 2.5-ns safety margin to guard against cycle time violations. In effect, the original critical path (8.5 ns) has changed to a noncritical one (6.5 ns). Increasing the skew to a positive value can raise the safety margin against race conditions between FF1 and FF2 to 3 ns, rendering a more robust timing for the circuit.

We thus turn appropriate nonzero skew into useful skew by using it to obtain better circuit speed, reduce the number of critical paths, or armor the circuit with more robust timing. VDSM designs sorely need the increased safety margin to guard against the timing uncertainties brought by coupling, IR drop, and other process variations. The increased circuit performance or safety margins can also serve to increase the timing budget in low-power design, which suffers an unfavorable trade-off between speed and power.

Nonzero skew offers other benefits, including its ability to greatly curtail the amount of switching at a single instant, reducing either the peak current or the simultaneous switching noise. When design complexities increase, this very cost-effective method can mitigate a source of chip failures. Additionally, by taking the permissible ranges as skew constraints, we don't have to minimize skew for every pair of registers. Relaxing that design objective can be important for low-cost or low-power designs in which designers can't afford to waste too many wiring resources or power on the clock tree.

Figure 1 Integrating a useful-skew methodolgy

The proposed new clock design methodology allows for a more efficient iterative design flow that takes clock design into account throughout the entire back-end process.

Because useful skew brings so many benefits to the design, it clearly requires careful management rather than simple minimization. The objective of clock design should be to arrange the skew to produce the best timing results. The relationship between skew and logic path delays also indicates that separating clock design from other design steps no longer makes sense. To achieve its goal of simply minimizing skew, the current design methodology considers clock design as a purely physical design step--a horizontal piece in a design flow. The clock skew management concept, in contrast, hinges on the interrelationship between clock skew and logic path timing. Clock design should take into account the system-level timing constraints, the partitioning and placement of the system, and the interconnect delays. Verification of the final result should occur in the context of overall chip timing. To achieve the most robust and efficient timing, designers must view clock design as a vertical slice in a design flow (see Figure 3).

Five-step methodology

The new clock design methodology consists of five steps: permissible range generation, clock skew scheduling, clock tree topology synthesis, clock net routing, and clock timing verification.

Permissible range generation uses a path delay calculator or static timing analysis tool to compute the data path timing information necessary to generate the permissible ranges of skew. Those ranges, which consist primarily of the longest and shortest logic path delays between each pair of sequentially adjacent registers, are adjacent only through combinatorial logic. Combined with such system-level constraints as clock cycle time, the process generates a set of permissible ranges for clock skew.

Clock skew scheduling uses the permissible ranges of skew between registers to determine if a feasible clocking schedule exists. If so, it computes a schedule consisting of required arrival times to each register. Most circuits allow a number of feasible clock schedules, but clock skew scheduling selects the one that produces the biggest benefits--either the best performance or the most robust timing (the fewest critical paths). Since that step chooses the best skew value from the permissible range, it's key to managing the skew.

Clock tree topology synthesis uses the clock schedule, the register placement locations, technology information, and buffer descriptions to determine a buffered clock tree topology that delivers the clock signal according to the schedule. The topology includes buffer locations and branch delay constraints.

Clock net routing takes the clock tree topology generated in the previous step and implements the layout of the clock net. This step includes routing each branch in the clock tree, connecting buffers and clock pins, and delivering the skew according to the schedule.

Clock timing verification assesses the quality of the clock tree implemented by the previous steps. It analyzes the clock tree by extracting the parasitic RCs and calculating the delays and skews, comparing those results with the target schedule to verify that the system is free of race conditions and can operate correctly at the required clock frequency. The verification feedback quantifies the tolerable variations in clock arrival time. For VDSM designs, clock tree verification must also analyze coupling and IR drop effects and report potential impact on the clock signal and the timing of the chip. Finally, verification must indicate where any violations occur and how to fix them.

Ahead of the game

Of course, given that implementing zero skew is difficult, implementing a set of specified skews might seem near impossible. Remember, however, that the main purpose of useful skew is to create safety margins. We intentionally pick skew that's as far away as possible from causing timing violations. Therefore even if we run into the same delay variations that we had in implementing zero skew, we would remain ahead by leaving larger room for error.

Unquestionably, the ability to accurately predict the delays and account for coupling and IR drop effects is also critical to the proposed methodology. For example, generating the permissible range depends on the accurate calculation of logic path delays, including the interconnect delays between logic gates. This calculation in turn demands accurate estimation of routing and parasitic RCs. The clock tree topology synthesis and clock net routing stages all require highly accurate interconnect modeling and delay calculation. Because clock timing verification serves as the check point for clock and overall chip timing, it's absolutely critical that the method accounts for all interconnect effects. At present, only 3D parasitic extraction can capture the exact coupling capacitances and delay calculation capabilities that account for the coupling and IR drop effects. Although clock design is critical to good VDSM design, designers must still attempt to minimize the portion of the design cycle spent on the clock.

It takes a lot more than tight tool integration to implement the new methodology, which calls for several innovations. It requires highly accurate yet highly efficient interconnect modeling and analysis technologies, including high-speed 3D parasitic RC extraction, high-accuracy cell and interconnect delay calculation, and coupling and IR drop analysis. Efficient clock skew scheduling requires a much tighter algorithm than current linear program solvers employ, given the large number of registers a chip may contain. Clock tree topology synthesis and routing has challenged even zero-skew methodologies. An efficient and highly accurate clock tree synthesis solution must deliver the clock signals that can extract better timing from the design.

Despite all the innovations, the clock solution must fit seamlessly into an existing design flow. In our methodology, execution of the clock design follows the placement of logic cells and registers. The clock tree then passes the result back to a place-and-route tool and continues the routing of the rest of the nets on the chip. Standard file formats facilitate the exchange of logical and physical design data as well as timing data between clock design and other design steps, such as synthesis, placement and routing, and timing verification.


Joe G. Xi, the vice president of products and marketing at Ultima Interconnect Technology, Inc., in Sunnyvale, Calif., received his Ph.D. in 1996 from the University of California, Santa Cruz, with a dissertation on useful-skew clock tree synthesis.

David Staepelaere is the project leader of clock tree synthesis tool development at Ultima. He is also finishing a Ph.D. degree in computer engineering at the University of California at Santa Cruz, where his research interests include layout synthesis, global routing, and topological wiring representations.

To voice an opinion on this or any Integrated System Design article, please email your message to miker@isdmag.com.


integrated system design  April 1999



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com email webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About