# Optimal Skew

2 comments

As a designer, it is the general tendency to minimize skews and have a perfectly balanced clock tree. Zero skews are not always good for the design as it may result in very high dynamic power consumption as all the flops and buffers will be toggling at the same time. As technology is shrinking and frequency is increasing, the impact magnifies. Also minimizing skew comes at a very big cost of power, congestion and thus die area. In this paper we propose the optimal criteria for selection of skew number to minimize power, congestion and, at the same time without any compromise in timing across all the corners. By optimal selection of skew number we are able to reduce the clock power consumption by 15%, clock buffer count by over 30% and significant congestion reduction with similar timing summary across all the corners.

Introduction
Robust clock tree designing is the biggest hurdle in high frequency designs. With shrinking technology and increasing frequency, the clock tree consumes an increasing fraction of resources such as design time, power, and wiring [9]. It decides the robustness of the design as well. While designing the clock tree, designers target perfectly balanced clock tree with minimum possible skew. While this ensures that all the flops capture the data at the same time yet this leads to very high peak dynamic power m dissipation. Due to simultaneous switching of high frequency clock signals, it also causes EMC-EMI failures.

Moreover achieving such target skew numbers comes at a very big cost of power, latency, congestion and thus die-area. In this paper, we propose the optimal skew number, wherein it has 10-15% lower overall clock power consumption, significant lower in congestion distribution and 47% lower clock buffers without any impact on timing across all the corners. We justify this mathematically and through experiments results.

Proposed Work
In a synchronous design, the flops capture the data on the edges of the clock signal. For positive edge triggered flops, it happens at positive edge of the flops. A typical SOC design has several hundred thousand flops and few thousand clock buffers. Typically, the flops to buffer ratio is around 20:1. For a zero skew design, all the flops toggle at the same time and thus sink lot of current from the source simultaneously. It leads to very high peak dynamic power consumption and voltage drop and thus affects the functioning adversely. Moreover for attaining zero skew more number of buffers are required and thus more routing and die area resources.

As the technology is shrinking, timing Closure across all the corners has become very challenging. The skew has direct impact on hold and setup timing. The biggest motivation to attain the zero skew is the hold timing across all the corners. But if that is analyzed further, an optimal skew number can be derived which would have no effect on the timing as explained below.

Impact on Timing
In a design, the hold timing condition of the flip-flop is expressed as:

Rewriting the above expression for skew

FF cp-q delay and hold time
Further analyzing the design, more than 90% of the flops are of one type, having a fixed range of transition at CK pin as guided by the clock specification file and very small load values at the output, thus their typical hold time and cp-q delay can be approximated as follows:

#### Table 1. Flipflop CP to Q delay

#### Table 2. Flipflop hold time

The typical transition numbers at the CK pin of the flops are between 150-200ps. These can further be changed through clock specification file. The load of the flops at the output is between 20fF to 50fF. The worst case minimum value of the FF cp-q delay can be approximated as 250ps from the table1 (column3, row4)

Similarly the worst case minimum hold timing of the flop can be approximated as 50ps from the table2 for a clock transition of the order of 150 ps and practical worst case load of the order of 50 fF plus .

Data Delay The data delay depends on the logic within the two flops and their physical placement. In the worst case, there could be no logic between the two flops and thus the data delay could be zero. But for the flops to have zero data delay they must be physically close as well. And in such a case they would be driven from the same clock driver in the clock tree. Thus the skew and uncommon clock path would be zero for them. On the other hand if the flops are considerably far apart, they will have non-zero skew and proportional data delay. Thus in the equation II the data delay will have mitigating effect on skew.

Thus substituting the worst values of the FF cp-q delay, data delay and hold time in equation II gives skew as

Thus there is scope of ~200ps for the skew without any degradation in hold timing in the worst case.