Clock tree synthesis (CTS) is at the heart of ASIC design and clock tree network robustness is one of the most important quality metrics of SoC design. With technology advancement happened over the past one and half decade, clock tree robustness has become an even more critical factor affecting SoC performance. Conventionally, engineers focus on designing a symmetrical clock tree with minimum latency and skew. However, with the current complex design needs, this is not enough.
Today, SoCs are designed to support multiple features. They have multiple clock sources and user modes which makes the clock tree architecture complex. Merging test clocking with functional clocking and lower technology nodes adds to this complexity. Due to the increase in derate numbers and additional timing signoff corners, timing margins are shrinking.
To meet the current requirements, designs that are timing friendly are needed and provide minimum power dissipation. This article describes the factors which a designer should consider while defining clock tree architecture. It presents some real design examples that illustrate how current EDA tools or conventional methodologies to design clock trees are not sufficient in all cases. A designer has to understanding the nitty -gritty of clock tree architecture to be able to guide an EDA tool to build a more efficient clock tree. First, the basics of CTS and requirements for good clock tree are presented.
Clock tree quality parameters
The primary requirements for ideal synchronous clocks are:
EDA tool role in clock tree synthesis
- Minimum Latency – The latency of a clock is defined as the total time that a clock signal takes to propagate from the clock source to a specific register clock pin inside the design. The advantages of building a clock with minimum latency are obvious – fewer clock tree buffers, reduced clock power dissipation, less routing resources and relaxed timing closure.
- Minimum skew – The difference in arrival time of a clock at flip-flops is defined as skew. Minimum skew helps with timing closure, especially hold timing closure. However there is a word of caution - targeting too aggressive minimum skews can be counterproductive because it may not help meeting hold timing but it can end up having other problems like increasing overall clock latency and increasing uncommon paths between registers in order to achieve minimum skew.
- Duty Cycle – Maintaining a good duty cycle for the clock network is another important requirement. Many sequential devices, like flash, require minimum pulse width on the input clock to ensure error-free operation. Moreover many IO interfaces like DDR and QSPI can work on both edges of clock. A clock tree must be designed with these considerations and symmetrical cells having similar rise-fall delays should be used to build the clock tree.
- Minimum Uncommon path - The logically connected registers must have minimum uncommon clock path. Timing derates are applied to the clock path to model process variations on the die. Using a standard timing derates methodology, derates are applied only on uncommon path of launch and capture clock path because it is unlikely that common clock paths can have different process variations in launch and capture cycle. This concept is also called CRPR adjustment. The important concept is that a clock path should have minimum uncommon path between two connected registers.
Figure 1: Common and ucommon clock paths between two registers
- Signal Integrity – Clock signals are more prone to signal integrity problems because of high switching activity. To avoid the effect of noise and to avoid EM violations, clock trees should be constructed using a DWDS(Double width double spacing ) rule. Increased spacing will help in minimizing noise effect. Similarly, increased width will help to avoid EM violations.
- Minimum Power Dissipation – This is one of the most important quality parameter of a clock tree. At the architecture level, clock gating is done at multiple levels to save power and certain things are expected to done while building clock trees such as maintaining good clock transition, minimum latency etc.
Today, a lot of R&D has been done on EDA tools to design an ideal clock tree. The CTS engines of these tools support most of the SOC requirements to design a robust clock tree. These tools even generate clock spec definitions from SDC(timing constraint files). A typical clock spec file includes:
- All clock sources information
- Synchronous/Asynchronous relationships between various clocks
- Through pins
- Exclude pins
- Clock pulling pushing information
- Leaf Pin
Going one level down in SoC to design an ideal clock tree
For most SoCs, the existing EDA tools are sufficient for CTS engine to generate an ideal clock tree. However, this is not always the case. This approach presented in this paper is suitable for SoCs or IPs, which have few clock sources and a simple clock architecture with minimum muxing of multiple clocks.
Today’s microcontrollers generally don’t have such a simple clock architecture. Microcontrollers designed for the automotive world have multiple IPs integrated into a single SoC. For example, a single SOC may have multiple cores, IO peripherals like SPI, DSPI, LIN, DDR interfaces for multiple automotive control applications. Considering human safety in automotive SoCs, testing requirements are also very stringent in terms of test coverage such as atspeed and stuckat. This leads to a very complex clocking architecture because it requires multiple clock sources (both on SoC clock sources such as PLLs, IRC oscillators and off SoC clock sources like EXTAL) and clock dividers in order to supply the required clock frequency to multiple IPs within a SoC.
In such cases, CTS engines cannot be relied upon to build a clock tree. Due to complicated muxing of various clock sources in multiple functional and test modes , EDA tools sometimes are not able to build the clock tree properly, often resulting in problems of increased latency, skew mismatch and huge uncommon clock path problems.
In the next section some real design case studies are used to illustrate how current EDA tools might fail to build the clock tree as expected by the designer and how a backend engineer can help design a robust clock tree either by providing proactive feedback to architecture designers or to improve the clock structure at the RT level itself or by using better implementing techniques.