datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Tell us What You Think

We want to know what you thought about this Design. Let us know by adding a comment.

ADD A COMMENT >

Designing a robust clock tree structure

Amol Agarwal and Priyanka Garg - Freescale Semiconductor

8/6/2012 10:39 AM EDT

Case study 1 - Clock logic cloning
Suppose a clock tree is required for the following logic.

In functional mode there is one master clock source func and one generated clock source gen_clk1. In test modes there is one test clocks tck1.  In functional mode register set 2 is clocked by gen_clk1 but in test mode, test clock tck1 is used instead.

The conventional way to define the clock tree spec for this design fragment would be to define the master clock sources (func and tck1) and generated clock (gen_clk1) and to define through pin for generated clock source so as to balance the latency of the master clock and the total latency of the generated clock(source latency to register clock pin plus latency from flop output to register group3). Defining a through pin for the generated clock source ensures that a CTS engine does not consider the generated clock flop as a sink pin and instead traces the clock path through CK-> Q arc of flop.

Assume that the latency of func clock while in functional mode is constrained by register set2 (highlighted in red in figure 2).  This will force a CTS engine to build the generated clock source flop 2 with minimum latency. This will only be possible if the minimum buffering is done from func clock source to mux1 input D0 as well as from mux1 output to generated clock source gen_clk1. In order to balance the latency of register set 1 and register set 2, the tool will be forced to insert buffering between mux1 output and register set 1 clock pins.  This implementation is correct in functional mode but will cause problems in test mode.


Figure 2: Original design

Test mode CTS:  The architecture of the design is such that test clock tck1 to all register sets 1-3 can be built with very low latency. However due to functional mode clocking constraints, as explained above, clock latency for the test clock will be high. The latency for test mode will be constrained by register set 1 as shown in above diagram by green line.  This cannot be avoided because of the need to balance register set1 and register set 2 latency in functional mode and buffering can only be done after mux1 output because it is not possible to increase the latency of the generated clock source. The consequence of this is that there is no option other than to increase the latency of register set 2 and register set 3 in test modes.  This is a serious problem because the latency of the test clock is increased because of functional mode clocking constraints. As discussed above in robust clock tree guidelines, an increase in clock latency can lead to multiple problems. This problem cannot be solved using advanced features like MMMC (Multi Mode Multi Corner)CTS of current EDA tools.

Solution: The solution to problem lies in cloning the clock logic as shown in figure 3.  EDA tools generally do not implement cloning of non-buffer logic in the clock path network.  The problem can be solved if there is a separate dedicated clock mux for the generated clock source flop. The limitation of placing clock buffers after the mux output has been removed and for register set 1, the clock buffering to balance latency in functional mode can be done between func clock source and cloned mux input D0.  Since there is now the bare minimum buffering done between cloned mux output D0 and register set 1, the latency numbers in test mode are not limited by register set 1 and it is possible to achieve minimum latency in test mode.


Figure 3: Modified design with cloned mux

Case study 2 - Clock muxing of two synchronous clocks

 
             Figure 4(a)                                       Figure 4(b)
Figure 4: Design  with two synchronous clocks

In this example clock 1 and clock2 are synchronous to each other. The assumption is that the minimum latency of both clock1 and clock2 is not limited by register group 1-3, but by some other clock group (not shown in diagram).

A typical behavior of most of CTS engines would be to insert clock buffers after the mux output to register group2 in order to save overall clock buffers. However this will introduce a problem of larger uncommon path between register group 1 and register group2 as well as between register group 2 and register group3. A CTS engine is not intelligent enough to understand that the architecture ensures that mux select will not toggle on the fly and there cannot be a case of launch on clock 1 and capture on clock 2 for register group2.  An alternate approach for CTS for these type of cases is shown in Figure 4(b). In figure 4(b), the clock buffers have been moved for both clock 1 and clock2 before the clock mux in order to have a greater common clock path between register set 1-2 and register set 2-3. Note that this is under the same assumption that latency of clocks Clk1 and Clk2 is limited by some other register group other than 1-3 and extra clock buffers were placed by the CTS engine after clock mux to balance skew requirements.  




Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)