United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 


Divide and Conquer with Hierarchical Physical Design
Print this article Email this article Reprints RSS Digital Edition

CommsDesign


Download a PDF of this article: Part 1, 2, 3, 4, 5

Fundamental limitations separate synthesis fromplace and route - linked by wire load models - in 180-nm technology and 300-MHz designs. In the past year,conventional synthesis,even with extensions such as timing-driven place and route, has given way to physical synthesis,which offers better performance and improved timing predictability. But physical synthesis alone does not present a complete answer to the challenge of advanced chip design; current physical synthesis tools cannot swallow a multimillion-gate design in one gulp. Instead,the design must be subdivided into manageable blocks.

This is of particular concern to Agilent, which competes in the high-end ASIC market,where designs often exceed 5 million gates and operate at frequencies greater than 250 MHz. Those ASICs are used in a variety of high-performance applications, such as advanced networking products and computer workstation chip sets,and our challenge is to maintain performance that can keep up with cutting-edge CPUs while keeping cost down and meeting aggressive schedules.

The answer for us is a blended technique that we call "structured custom." It has evolved from Agilent's legacy as part of Hewlett-Packard,when we designed instrument chips and portions of CPU cores. The chips were full custom and automation was used to help design them. Those early techniques formed the basis of our thinking today.

The design of a chip typically begins with its partition into macro functions, each created by an individual designer. A designer at the next level of hierarchy then places these blocks (macro functions) into a new design and the process continues until the chip is built. A typical chip today has approximately seven levels of hierarchy and is composed of a hundred individual designs, all of which must be managed by a limited number of designers.

In the early days,many,if not all,of our blocks were custom. Today, Agilent uses data path and custom analog design for selected blocks in many of our chips. However, in the interest of greater productivity for both customers and physical designers we strive to use an RTL-based standard-cell approach for most of the blocks. In our divide-and-conquer approach, it is typical to split a design so that any macro function can be modified and rebuilt rapidly. Thus, when the engineering change orders arrive -- and they always do -- we can rebuild the chip quickly because each block is independent and only the affected blocks must be modified. Completing a change order is largely a matter of rerunning the top-level route with the modified blocks.

A key aspect of our desire to rapidly turn blocks is achieving one-pass timing closure across all levels of hierarchy. This requires understanding the sources of timing variance and compensating for them. An excellent way to reduce timing discrepancies is to move from statistical wire load models,known as WLMs, to the location-based RC estimates used in physical synthesis.

The physical synthesis edge

In the past few years, ASIC designers have seen traditional synthesis techniques break down. Starting at 0.35 micron,wire load delays caused when wire capacitance slows the driver become a significant portion of overall delay. At 0.25 micron, wire delays due to propagation delay in the wire itself also become significant. And at 0.18 micron, delays arising from wires often exceed gate delays on critical paths.

The WLM has been the traditional statistical method of coupling synthesis timing with post-artwork timing. For smaller process technologies, interconnect exerts a greater influence on overall delay and WLM-based timing correlates less with post-artwork timing.

Synthesis tool writers have looked into the wire parasitic problem and realized that it is quite complicated. A wire can be described by several factors, including length, width, neighboring wires, gate loading and fanout. Gate loading and fan-out are the "knobs" that are directly controlled by synthesis. WLMs make the assumption that fan-out can predict a wire's parasitics. Therefore, in a single integer -- that is, fan-out -- synthesis tools have tried to wrap up an extremely complicated problem.

Differentiating physical synthesis tools,like Synopsys' Physical Compiler, is being able to use their knowledge of placement to make more accurate estimates of wire delay. Unlike WLMs, which are based on statistical distribution of wires with a common fan-out, Physical Compiler estimates wire resistance and capacitance wire by wire.

For the physical synthesis tool to work quickly, it uses Steiner or half-perimeter estimates to calculate wire length for each net based on the locations of the pins to which it is attached. Currently, Physical Compiler uses a lumped RC model based on horizontal and vertical resistance and capacitance parameters for the wires. These location-based estimates provide a much more accurate prediction of post-route timing.

The impact of block size

The longest possible wire in a block, barring a meandering route, runs from corner to corner, horizontally across the width of the block and vertically across its height. Thus, a block 's half-perimeter bounds the worst-case, direct, point-to-point route. Similarly, it bounds direct multiple fan-out nets.

As blocks grow, so do their half-perimeters. So for any process larger blocks tend to have longer wires. Although some wires in a large block are short,there are generally several that are quite long that result in large wire capacitance and resistance and long wire delays. Consequently, larger blocks are increasingly susceptible to larger wire delays.

Furthermore, not all routes are direct.As they meander to avoid localized congested hot spots,WLM and even physical synthesis estimates become less accurate. Just as growing block size increases variance from WLM estimates, larger blocks are more susceptible to meander-induced error. Although physical synthesis greatly enhances the accuracy of timing prediction, it is still sensitive to increasing block size.

Fig.1 shows how timing variance relates to block size in a 0.18-micron process. Each curve shows typical error for a synthesis prediction vs.actual extracted timing. The first curve highlights the error of a WLM-based prediction. The second one illustrates the decreased error based on a physical synthesis timing estimate.

By introducing a little pessimism into our WLMs,Agilent can tolerate small timing errors and still achieve one-pass timing closure. As Fig.1 shows, WLM-based estimates are valid for blocks up to approximately 75,000 gates. Similarly, by using interconnect estimates that are equally pessimistic, it is possible to achieve one-pass timing closure using physical synthesis for up to 200,000 gates.

To get the most out of hierarchical design we try to determine the optimal block size. Our main objective is choosing a block size that gives us a good chance of one-pass timing closure on each block. Another goal is keeping the block size large in order to hold the number of standard-cell blocks to a minimum (50 blocks is a typical target). As the number of gates continues to grow exponentially with each generation of chips, achieving one-pass timing closure on larger blocks is a key to productivity -- and physical synthesis is the latest tool to help us achieve aggressive productivity objectives.

The merits of hierarchy

Using an average block size of 150,000 gates, a 10 million gate design results in 67 blocks. Physical Compiler does a good job of achieving one-pass timing closure with blocks of this size. This is a big improvement over the 174 blocks that result from using conventional place and route techniques with the largest one-pass timing closure, WLM-based blocks.

Once we identify the standard-cell blocks, early floor planning, which enables exploration of trade-offs in the physical architecture of the chip well before the blocks are completed,may begin. Our early floor planner requires only block size and shape estimates and a top-level netlist that connects them.

There are several advantages to such floor planning. The most important is that any major architectural obstacles that affect timing are identified early in the design cycle. Once we are aware of the timing concerns we can choose whether to address them from the RTL level or the physical level. Another key advantage offered by a timing-aware floor planner is it can generate major timing constraints,which are needed to enable budgeted block synthesis or physical synthesis or both.

When the floor plan begins to solidify,the design shifts from the top-down "divide"stage to the bottom-up "conquer" stage. At this point, there are clear specifications for block size, shape, timing and port locations. These specifications allow the chip design tasks to be subdivided as required. Generally,each designer is responsible for several blocks. Because the design is hierarchically split and the designs can be processed with a great deal of independence, many designers are typically put on the design for a short time to help accelerate the design. Theoretically,for a 10 million gate design there is nothing stopping us from using 67 independent designers to manage the 67 standard-cell blocks if a schedule required us to accelerate the design.

Hierarchical design enables this divide-and-conquer approach, making it possible to perform different parts of the design in parallel. Simple blocks in the chip are implemented with one level of hierarchy and consist simply of standard cells. More complex blocks are partitioned into multiple levels of hierarchy. The submodules of complex blocks can be individually implemented and reassembled bottom-up.

At the top level of the chip, simple and complex blocks are assembled.Each block, regardless of how many sublevels of hierarchy it contains, is treated as a hard macro at the top level. Similarly,hard IP macros are handled just like any other piece of the chip. Thus, our hierarchical design framework mirrors the SoC approach that is advocated by many in the IC design community.

Floor planning

Floor planning is an iterative process that can start before the RTL is finished and continues until final integration of the hierarchical pieces begins. Although it is a continuous process, it can be roughly divided into two phases: early and malleable (see Fig.2,page 24).

The early phase involves rapid exploration of physical design alternatives; it also highlights possible timing obstacles in the logical design and involves many quick iterations. We want to be able to run several trials in a day with each iteration preferably taking less than an hour. This is possible with top-level netlists consisting of about 50,000 nets and using simplified models for delay and congestion. The estimates are typically within 15 percent of the actual congestion and timing results.

As the early floor plan changes, so do many of the block timing budgets. Block size tends to be a strong function of the block time budget. As blocks shrink and grow with changes in the budget, the next floor plan iteration accommodates the new block size estimates.After a few iterations this process converges and block size stabilizes.

As the early floor plan is refined,it quickly becomes apparent which paths are going to present a timing challenge. The 80-20 rule typically applies: 20 percent of the paths represent 80 percent of the difficulty, so identifying the troublesome 20 percent early gives us more time to deal with them. Among these paths there are usually a handful of particularly tough ones. Catching those early, while the RTL is being developed, allows some fine tuning of the RTL and avoids time-consuming custom artwork solutions late in the design cycle. The remaining paths are addressed with timing-aware floor planning.

There is no clear line dividing the early floor planning phase from the malleable phase. (We use the term "malleable "because it implies a degree of solidity, but also a degree of adjustability.) The malleable floor plan is beginning to firm up, but it is by no means frozen -- the malleable phase's primary feature is incremental refinement of the floor plan.We move and adjust blocks, pushing them together or apart as needed, keeping their relative positions fairly constant. Similarly, block size remains reasonably consistent in the malleable phase, generally changing by no more than 10 percent.

Malleable floor planning involves incremental improvement of the general floor plan created in the early phase. With the block size estimates firming up, we begin getting meaningful congestion data from trial routes of the top-level netlist. We run trial routes periodically throughout the floor planning process (early and malleable)for two reasons. First,they ensure that the evolving floor plan can be routed and they give early warning to possible routing hot spots. Second, trial route data provides more accurate timing estimates because they include detailed route information.

The malleable phase tends to have significant content of place-and-routed blocks, rather than simple rectangular estimates. These blocks function as hard macros that enable the trial routes to accurately reflect over-the-block routing. Similarly, the block timing models for placed and routed blocks reflect actual extracted timing rather than estimates based on synthesis or physical synthesis runs or both.

As the dummy block rectangles are replaced with the real Library Exchange Format, the malleable floor plan models timing and congestion with increased accuracy. Fewer, subtler refinements are made until timing is met and the ability to route is assured. Then this final floor plan is frozen and becomes the blueprint for assembling the chip.

The designer's perspective

Physical designers begin with the design in the early floor planning state. It is important at this stage to estimate the size of the block from synthesis runs and to guess how much logic will be added to the block. Normally, customers give only block diagrams of what they believe the chip will look like; from these diagrams, guesses of major buses and critical timing signals are identified. Clocking and power needs are also identified at this stage. These estimates are then put into a specification that is used by the designer assigned to floor planning.

Early estimates are important because the best way to have a successful chip is to catch problems early. If the RTL has not been written or is still incomplete,there is a greater chance that the specifications and protocols of the chip can be influenced to produce a chip that is easier to design.

The floor planner will generate timing estimates, which will be used in first-pass synthesis. As the RTL is delivered, the blocks are built and rebuilt in an iterative fashion. Blocks shrink and grow (usually grow) from the initial estimates and the floor plan evolves accordingly. The floor plan results in updated interblock delay estimates that are budgeted back to block constraints.

Two factors

When the blocks are complete, they are routed together. This is where two fac- tors are put to the test: Were the floor plan predictions of route congestion and timing accurate, and did the blocks meet their requirements? Fortunately, both concerns can be addressed early. First, the blocks are tested for consistency with their requirements as they are completed. Second, as the blocks evolve, several trial routes of the chip are performed (using conservatively dense Library Exchange Format models for the unfinished blocks). These trial routes provide an ongoing double check to verify that the floor plan is on-target.

Though it is not possible for current physical synthesis tools to tackle a flat high-speed 10 million gate design with a hierarchical approach, with early floor planning and a solid final assembly solution in place,physical synthesis greatly enhances productivity by making one-pass timing closure an option on larger blocks.






  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.



All White Papers »   

  Design Resources
Designing for a dual Galileo-based GPS system
Malcolm Lomer of SiGe Semiconductor discusses GPS design challenges with the Galileo satellite system.
More »
 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About