United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Design Automation

Using Hierarchical Clock Tree Synthesis to Generate Balanced Clock Trees

High-performance ASICs require careful clock design to achieve full performance from ASIC processes.

by Pierre Ragon & Abhijeet Dugar


For complex ASICs, circuit designers ensure proper timing by carefully planning and implementing the distribution of clocks throughout the circuit. This part of the design process is critical because poor clock distribution can cause a circuit to malfunction, especially because of problems caused by skew and latency. To minimize skew and latency, circuit designers create clock trees that balance delays and loads in the clock buffers.

Skew is the difference in the arrival time of a clock edge at any two flip-flop clock inputs. Minimizing skew is necessary to prevent hold-time violations, which can cause flip-flops to operate in metastable states and provoke random circuit failures. Because hold-time violations are independent of the clock period, they cannot be avoided by increasing the period. Skew minimization is also important because skew reduces the available time budget and decreases operating frequency.

Latency is the delay that occurs between the time a clock edge arrives at the input pin and the time it arrives at the input of the flip-flops it must clock. This delay occurs when the clock must pass through several logic stages. Minimizing latency is an important way to facilitate interchip communication.

The availability of clock buffers in most ASIC libraries has, to some extent, alleviated the clock distribution design effort. These buffers can drive a huge capacitive load and are designed to power up clock networks. However, there are usage restrictions, such as the number of pad sites required.

The degree of difficulty in distributing clocks throughout a circuit depends on the circuit's complexity and the number of clocks it uses. Except for chips with only one or two clocks, clock buffers alone are not the solution to clock distribution. ASICs designed in the telecommunications field usually require many different clocks.

Our design group at Lucent Technologies in Paris had to overcome many challenges in developing a clock tree for the design of a complex telecommunications ASIC.

The circuit Our design is a 50,000-gate synchronous digital hierarchy (SDH) ASIC. The circuit is a configurable SDH multiplexer/demultiplexer that maps four 2-Mbit/s data streams into a synchronous transport module (STM) frame and at the same time can extract four 2-Mbit/s data streams from an STM frame (see Figure 1).

There are 15 different clocks in the circuit, including six that are internally derived. The design uses the rising and falling edges of some clocks. Most of the clock network includes gating logic because the operating clock is user-selectable. Although the ASIC is not large, the number of clocks made the development of the clock tree nontrivial. To implement the complete clock network, it was necessary to combine the use of clock buffers and clock tree synthesis.

Clock tree synthesis requirements Except for a few circuits that mix legacy schematic blocks with HDL-designed blocks, most ASICs are designed using an HDL and synthesis methodology. Similar tools are available for clock tree design. These tools are an asset to the circuit designer creating complex designs because of the near impossibility of editing and modifying the rat's-nest schematics generated automatically from synthesized blocks.

Often an add-on option to logic synthesis, floorplanning, or place-and-route tools, clock tree synthesizers resolve clock layout and buffering to minimize skew and loading effects. They distribute clock signals based on user-defined specifications, such as target clock delay (that is, the delay between the root-clock net and leaf-cell clock), clock buffer types, and maximum clock load. Effective clock tree synthesis results in minimal prelayout clock skew and latency. For the SDH ASIC design, we needed a clock distribution tool that would address the following requirements.


Figure 1. A block diagram of the synchronous digital hierarchy ASIC shows the many clock domains.

First, the tool had to be able to process the clock tree network while leaving the rest of the design unchanged. That would give us the capability to set parameters for the clock net that were different from the parameters for synthesizing the rest of the design. To get a well-balanced tree with low skew and steep clock edges, we wanted to limit the cell fanout to a lower value than for the synthesis of the rest of the design. We did so to ensure that the buffers used in the clock nets would drive only a limited number of flip-flops.

We wanted a tool that could maintain control of the buffer template that defines the tree, branches, and leaves of the clock nets. In this design, one clock domain often spans several hierarchical logic blocks, each of which is processed separately. If we could define a specific buffer template for each block, then we would be able to instruct the tool to insert that template in the block to be processed.

It was important that the flip-flops' clock inputs be evenly connected to all branches of the clock tree, which is necessary to have a well-balanced distribution. For example, to clock 51 flip-flops with a buffer having a maximum fanout limited to 10, having five buffers drive 10 flip-flops each and another buffer drive the remaining flip-flop is a poor solution (see Figure 2A). A better solution is to have three buffers drive nine flip-flops and three buffers drive eight flip-flops, which balances the load (see Figure 2B). Logic synthesis and optimization tools typically don't implement load balancing, which is one reason we needed a dedicated clock tree synthesis tool.

For simple designs, the tool should be able to create the network automatically. Another feature we were looking for was the ability to constrain latency to avoid creating a buffer tree with too many buffer layers. Again, the resulting synthesis work had to apply only to the clock network, not to the rest of the logic.

The tool would have to allow placing and routing the clock network, first to ensure that interconnection delays would not add skew--an increasingly critical issue for submicron design. The clock tree synthesizer would have to provide support for passing weights to the floorplanning tool for that tool to take critical paths into account. We thought it was also important for the tool to be able to prohibit network editing that would inadvertently change the clock's polarity. Finally, to verify that the network we built conformed to our intentions, we needed a clock tree display utility.

The entire design was synthesized using the ASIC Synthesizer tool from Compass Design Automation. Because the Compass Clock Tree Synthesizer option met our needs, we used it to generate the clock distribution network for our design.

Implementing clock tree synthesis The SDH ASIC is a fully VHDL-synthesized design. Below are the clocking requirements for this circuit that show how the clocks should behave. The circuit is organized as a collection of modules such that:

  • Only leaf-level modules infer logic. There is no hierarchy below the leaf level.
  • No logic is inferred at the structural level other than the logic in the leaf modules it instantiates.
  • Each leaf module uses only one clock, with only one edge active in this module. For the gated clocks, the gating logic is not implemented in the modules where the clock is used. Within one module, the clock can drive only flip-flops. If the clock signal is used for other purposes, that takes place outside the modules where the clock is used to trigger flip-flops.

Although design considerations primarily determined how the circuit was partitioned, we made sure our clock domain rules were fulfilled. Following these design rules helped us to keep in mind a clear picture of the various clock domains and their interactions. In addition, it gave us better control of the clock-related design task. All modules have the same structure with regard to the clock net: Only one clock signal from the module interface is connected to a flip-flop's clock input. Keeping the structure of the clock nets the same in each module meant that the clock tree synthesis tool could process them with similar command files.


Figure 2. Clock trees have minimal skew when the loads are relatively balanced as in Figure 2B rather than unbalanced as in Figure 2A

After the rest of the design was synthesized, we began the procedure to synthesize the clock tree for the SDH ASIC. The first step was to analyze the circuit and define the clock distribution that would meet our requirements for each clock domain.

The next step was to write a command file for each module that would describe the clock tree template for this module and the constraints we wanted to apply to the network (see Listing 1).

Next, we synthesized the clock tree, which did not take much time compared with, say, placing and routing a design. At this point, we had a choice of doing simulation or static timing analysis. At 50,000 gates, the circuit was small enough to allow complete simulation. However, because simulation requires developing test vectors, we chose to perform static timing analysis. The ASIC synthesis tool we had chosen had an integrated timing analysis tool, so that simplified our decision. Using that tool made it easy to analyze the preroute clock skew and to maximize operating frequency for the different clock nets without changing the design environment.

The next step was to place and route the clock tree. We froze that part of the design so that it would not change while we placed and routed the rest of the design. Once that was done, we performed postlayout timing analysis to verify the results. If timing problems had showed up at this point, we had a couple of options. To begin with, we probably would have had to tweak the placement and redo some of the wires. For minor problems--that is, if the timing had not been too far off from what we wanted--we could have resized some buffers or done a selective re-placement. For more serious problems--for instance, if the timing had been far different from what our design required--then we might have had to go back and redefine and re-place the clock tree.

We synthesized the clock tree for our design successfully, even though the Clock Tree Synthesizer did not add extra clock connectors to the modules. We could not expand a single input connector to multiple connectors at the periphery of a module. Because the best distribution is possible only when all the flip-flops in a clock domain belong to the same module, if other users do not want to merge modules that have the same clock, they cannot expect this ideal clock tree to be implemented. This tradeoff wasn't much of a problem because the circuit we've been considering operates at only 20 MHz. Consequently, the skew budget was not tight.

Our results The Compass Clock Tree Synthesizer was indispensable in preparing the preroute gate-level simulation netlist. It allowed us to adjust the clock distribution during the place-and-route phase to suit the chip's clock tree requirements. However, on future designs with a higher-frequency clock, the problem undoubtedly will be more difficult, especially considering the interconnection delays in deep-submicron technologies.

That is why we would like to have the full capability of clock synthesis at the floorplanning stage for future developments. The physical hierarchy is usually different from the logical hierarchy, and the design probably cannot always be structured to cleanly separate the different clock domains. It is possible that gating clock logic, rising-edge-triggered memories, falling-edge-triggered memories, and a portion of logic that uses the clock as data could be in the same physical block. The ability to use clock tree synthesis at the floorplanning stage would allow us to process selected subnetworks of a global net. Doing so, in turn, would enable us to tweak the clock distribution without touching either the gating logic or logic that uses the clock as data.

Pierre Ragon is a senior ASIC support engineer at Lucent Technologies Inc.'s Paris facility.

Abhijeet Dugar is the product manager for synthesis tools at Compass Design Automation Inc. (San Jose).

To voice an opinion on this or any Integrated System Design article, please e-mail your message to miker@isdmag.com.


integrated system design  August 1997



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com e-mail cam@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome
Copyright © 1997 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About