To some,
"deep-submicron timing closure" may sound like the title of a science-fiction novel about repairing molecular holes in the space-time continuum. Yet, for many designers of today's complex chips, this deep-submicron (DSM) problem seems just as overwhelming and far from reality as the otherwordly scenario previously mentioned. Just three short years ago, timing closure was a little known phrase in the EDA industry. Today, it's the centerpiece of an industry-wide retooling effort.
The timing-closure problem
can be defined as the unbounded design iterations that result from unpredicted timing violations. While it would be easy to simply state that poor correlation between predicted and actual timing lies at the root of the problem, in truth the problem results from myriad DSM effects. These effects are difficult to predict because they are highly dependent upon the detailed physical implementation.
Hope, however, is on the way. New methodologies, EDA technologies, and design flows that can enable timing
closure for high-performance ASIC designs are emerging. In addition, various possible future directions may characterize the second wave of this industry phenomenon.
Out with the Old
For well over a decade, the traditional ASIC flow clearly separated the logical front-end flow from the physical back-end flow - often referred to sarcastically as the "over-the-wall" approach. With no physical view of the design, full-chip gate-simulation was based on timing calculations from relatively basic
estimates of interconnect. This technique worked exceedingly well for ASIC processes ranging from 2 ým down to 0.35 ým. However, at 0.25 ým and below, many designs required from 2 to 20 iterations in the front-end flow and at least 2 to 3 iterations through the back-end flow. As wafer mask costs continued to rise and time-to-market pressures accelerated, the back-end iterations demanded even greater attention.
|
Figure 1 - Vendors and their timing-closure territories.
|
|
|
This (simplified) overview demonstrates both the wide range of design flow steps that can impact timing closure and the corresponding range of solutions.
|
Today's mainstream ASIC design flow includes gate-level floorplanning to help constrain interconnect variability
prior to full-scale detailed routing. However, many ASIC designers still don't use any form of floorplanning. In either case, the vast majority of designers continue to run stand-alone register transfer level (RTL) synthesis (over and over and over again!). Even with custom wire-load models (WLMs), the typical design-iteration loop consists of synthesis, interactive floorplan adjustments, parasitic extraction, and static-timing analysis (or gate-timing simulation). Therefore, timing closure grows
increasingly difficult for numerous reasons.
Advanced manufacturing processes enable higher clock and edge rates, thus allowing less time for signals to travel between cells; virtually no time is left for settling, which can be verified by actual waveform traces that bear little resemblance to theoretical square waves. These processes also increase the density of interconnect, which can consist of up to seven layers of closely-spaced metal. The result is greater timing-variation due to capacitive and inductive
coupling. Secondly, ASIC trends have continued the relentless push for larger die-area, larger blocks of intellectual property (IP), and more I/O to accommodate rapidly increasing design complexity.
These trends greatly compound the problem of keeping all critical paths short during routing. In addition, modern system-level integration (SLI) architectures frequently include multiple on-chip busses and clock domains, which complicate timing-driven placement-and-routing optimization. Add to the mix the fact
that timing is no longer the sole concern in physical design; power, reliability, and signal integrity issues are forcing thorny tradeoff decisions in attempts to satisfy more requirements. Finally, increasing the run-time in these large designs adds further pressure to reduce iterations.
Clearly though, the root of the DSM timing-closure problem is the highly sensitive relationship between critical-path timing and physical design. The statistical WLM is infamous for its poor predictive capabilities.
Not surprisingly, many have found that even custom WLMs - where the delays are design specific - can't resolve the timing-closure problem. Yet, this creates a designer's paradox: How does one achieve timing closure in the logic domain if it's dependent upon physical decisions that are themselves dependent on the logic domain?
In with the New
The old approaches, then, simply don't fit today's tightly coupled design realities, in which physical-design issues now dominate logical-design
objectives.
The conclusion (now accepted industry-wide) is that logical and physical design are co-dependent, and must be addressed concurrently (see "Hierarchy is Hard," ISD Magazine April 2000, p. 18). In response to this growing dilemma, a new breed of algorithms and tools have entered the market (see Figure 1). Fundamentally, all share in the concurrent logical/physical approach. However, as we will see, these tools differ greatly in scope, working abstraction level, and granularity of concurrency. The
winners in this new arena will be determined based on technical merit as well as on the methodologies that are found acceptable to customers, ASIC partners, and EDA suppliers alike. Encouragingly, early consensus indicates that concurrent logical - and physical - design flows are making the difference between success and failure for complex, high-performance ASIC designs.
Creating order out of chaos
In order to make sense out of the confusing array of EDA tool and flow choices, let's begin with
a conceptual overview (see Figure 2). The first approach is to embed pre-layout design planning and physical synthesis into the RTL design flow. A second approach is to literally embed placement algorithms inside of RTL logic synthesis. A third approach attempts to do a better job of placement by considering the design in greater detail at this stage, so that the results will be routable under all constraints and not require iteration. The fourth approach embeds logic-synthesis optimization inside of the
layout flow itself - which, in general, is intended to offer a turnkey gates-to-GDSII subflow. These approaches aren't mutually exclusive and, in fact, can offer benefits to timing closure when properly leveraged together.
One particularly interesting twist on the turnkey layout approach is the "fixed-timing" design methodology from Magma Design Automation, Inc. (Cupertino, CA). The synthesis netlist is abstracted and restructured into super cells that determine "logical effort" in terms of capacitance
ratios on cell inputs and outputs, and then are optimized to compute a fixed critical-path delay that will be achieved later once mapped into the target ASIC library. Following floorplanning, cell loading is fixed using placement and optimization techniques that manipulate "gain budgets." Next, buffer sizing is fixed, followed by fixed placement (after global route). Finally, detailed routing is performed, including remaining optimizations to ensure timing closure. This fixed-timing methodology is favorable
to managing risks from crosstalk delay and enables an excellent crosstalk-noise solution. Electromigration is avoided by sizing wires based on RMS current calculations; antennas are avoided by jumpers and buffer insertions.
The physical-synthesis wars began in earnest with Ambit's physically knowledgeable synthesis (PKS) announcement in June 1998, prior to their acquisition by Cadence Design Systems, Inc. (San Jose). Now known as Envisia Ambit synthesis, the tool automatically performs time budgeting
across numerous RTL blocks and then concurrently places cells during the synthesis process.
It features a pin-based static-timing analyzer that minimizes recalculation overhead on changing paths, a quadratic placer, parameterized Steiner routing, and support of DCL/OLA (Design Calculation Language)/(Open Library API (Application Programming Interface)) ASIC libraries. Envisia works with a shared logical/physical database, and integrates with Silicon Ensemble using GCF (General Constraint Format)
constraints. Enhanced versions of Envisia that integrate data path, power, test, and concurrent signal integrity analysis during synthesis are now being introduced, including support for today's 64-bit platforms.
If Ambit started the physical-synthesis wars, then Synopsys, Inc. (Mountain View, CA) must be credited for dominating the battlefield with some pretty heavy artillery. The Synopsys response has been dramatic: their new suite of tools incorporates physical design throughout the entire flow. Starting
with Chip Architect at the early block levels and/or RTL, the tool uses RTL timing and area estimation to help users create a quality floorplan with defined global interconnect. Next, known top-level routing delays permit automatic time budgeting of each RTL block, ready for Physical Compiler synthesis. Physical Compiler embeds the Flex Place timing-driven placement algorithm with Design Compiler logic synthesis to accomplish block-level timing correlation. The Chip Architect floorplanner, along with Prime
Time static-timing analysis, helps manage full-chip timing budgets in a hierarchical flow by incorporating clock-tree synthesis, blockages for power and ground, congestion analysis, and engineering change order (ECO) routing.
Monterey Design Systems, Inc. (Sunnyvale, CA) has introduced Dolphin, a powerful physical-design environment that boasts fine-grained concurrent optimization of almost every conceivable physical-design concern.
Dolphin simultaneously addresses placement, routing, R(L)C extraction,
timing calculation and static-timing analysis, clock-tree synthesis, power routing, crosstalk avoidance, electromigration, and antenna effects. To illustrate this point, clock trees are built and power connections made as each cell is placed and routed and full 3D RC extraction occurs for selected critical paths. Although, like Magma, the full-chip design is always flattened, Dolphin differentiates itself by distributing extremely large jobs across 64-bit multiprocessing workstations - thus simultaneously
satisfying the myriad constraints with each incremental decision. Monterey claims that Dolphin will produce smaller and faster DSM designs with reduced time to market, due to the concurrency of Dolphin's Global Design Technology and its Fluid Block Design function. Fluid Block Design eliminates artificial barriers between local intra-block wiring and global inter-block wiring, and can restructure blocks to further optimize global routing.
|
Figure 2 - Four perspectives of the timing-closure problem.
|
|
|
EDA vendors have responded to the timing-closure problem with four approaches that have much in common, yet emphasize different steps within the design flow. While these approaches aren't mutually exclusive, it's too early to predict with combinations, if any,
will find market acceptance.
|
Avanti Corp.'s (San Jose) new Single Pass timing-convergence flow is built around the technology in their Jupiter tool, Planet/Apollo tools, Saturn tool, and the Milky Way database. Beginning with RTL analysis, these tools can detect timing architecture complications such as asynchronous loops, snake and multi-cycle paths, and multiple-clock domains. Jupiter's embedded synthesis and timing-driven placement technology estimates for area and speed, while
generating a chip-level floorplan and performing timing budgeting across blocks as it generates custom WLMs and synthesis scripts.
Automated global-routing, interactive power- and bus-routing, and clock planning is included at the RTL level. Concurrent synthesis, placement, and optimization within Jupiter augment Apollo's timing-driven place-and-route algorithms. Physical optimization includes buffer insertion, gate sizing, logic restructuring, and scan-chain optimizations.
Sapphire Design Automation
(Santa Clara, CA), a company focused on timing convergence at the placement stage, is offering Formit, Noiseit, Powerit, and Clockit. Formit is an advanced-placement tool that integrates electrical analysis and optimization, while Noiseit provides complementary electrical-noise avoidance. Unlike most placement tools that only address timing slacks, Formit addresses reconvergent delay- nodes and combinational logic-switching activity to minimize noise and power consumption. Formit's electrically driven
algorithms are a multilevel hybrid of min-cut and min-sum partitioning and placement. Optimization is also unique, because Formit first optimizes the edges of blocks, so that internal circuit topology remains flexible for fine-tuning adjustments. Noiseit adds additional circuit optimization by analyzing full-chip noise interactions in the electrical, temporal, and physical domains, and fixing glitch and delay sensitive nodes. Powerit assists power-management design goals, while Clockit enablex clock-tree
synthesis and placement optimization. Sapphire has recently introduced post-placement and post-routing options to Formit, which now performs incremental ECO-placement optimization to meet timing, noise, power, and reliability requirements.
First Encounter, Silicon Perspective Corp. (Santa Clara, CA), produces a physical design tool that integrates with external back-end flows for final detailed-routing and manufacturing preparation. Boasting a flat-layout capability of up to 2 million cells on 32-bit
platforms, Silicon Perspective credits its efficient Fast Track database and algorithms for fast iteration times. Placement-and-trial routing is performed from a gate netlist and timing constraints, followed by 2.5D RC extraction, clock-tree synthesis, static-timing analysis, and buffer insertion/sizing. The interactive gate-level floorplanner includes a relatively simple user interface that allows logic designers to work naturally in their familiar logical-hierarchy view, even when traversing the physical domain.
The Partition Optimizer breaks up large, flat designs into separable pieces (greater than 2 million cells), automatically creating the necessary constraint shells for separable back-end ASIC foundry flows.
Aristo Technology, Inc. (Cupertino, CA) and Tera Systems, Inc. (Campbell, CA) both address front-end physical design planning, but the similarities end there. Aristo's IC Wizard performs automated floorplan synthesis, starting from RTL or block-level inputs, automatically estimating appropriate block
sizing and shape, port assignments, and exploring the range of global-interconnect optimization choices. IC Wizard is capable of synthesizing hundreds of choices, allowing the designer to pick and choose the best combination before proceeding onto synthesis. The global-route algorithm considers power and special nets, clock trees, electromigration and voltage drops, and extracts wire parasitics for early timing prediction. Alternatively, Tera System's Tera Form emulates the structured-custom techniques of
microprocessor designers by automatically recognizing data path, control, memory, and random logic for individual optimization at the RTL level. Tera Form automatically budgets full-chip timing across blocks, then restructures the logical hierarchy and logical clustering to minimize critical paths and area. Tera Form outputs restructured RTL code in a revised logical hierarchy, along with synthesis scripts and DEF placement information, so that the output of logic synthesis and layout will quickly converge
on an optimal timing constraint.
Gazing into the crystal ball
One quick look at the dizzying array of EDA vendors, tool choices, and design flow alternatives, underscores an awareness that the EDA industry is at an inflection point. The need for the move to concurrent logical and physical design has been confirmed, but the winning approach to achieve that end has yet to be determined. What are some likely directions for change in the coming years?
As SLI designs continue to grow in
complexity and performance, the need for greater use of hierarchy at all points across the flow will demand more hierarchical and incremental features in the algorithms and databases. The introduction of these concepts will also enable greater leverage from automatic chip partitioning algorithms. Even with hierarchy, the move to 64-bit platform support and the use of distributed runs in multi-threaded and multi-CPU workstation clusters is proving necessary, particularly for synthesis and back-end physical-design
operations. For those not able to muster such horsepower, web-based "eDesign" services (such as Monterey's eDolphin) may help distribute the high cost of infrastructure needed to achieve timing closure.
The convergence of logical and physical optimizations is already embodied in data-path design, so look for greater use of data-path synthesis and optimization across the flow. Because timing and power trade-offs are inseparable, expect more integration of power synthesis, analysis, and optimization into
timing-based flows.
In keeping with this mass-integration trend, signal-integrity analysis will also need to crowd itself into synthesis algorithms to meet the challenges of sub-100-nm designs.
The word synthesis is rapidly becoming an over-loaded operator. Over time, we'll likely see a splintering of synthesis as we know it today. The front-end portion will focus on designer interaction, including both synthesis control and tight integration with RTL-level floorplanning synthesis as a means of
communicating design constraints and priorities. Back-end synthesis optimization - where specific library cells and interconnects for a process must be concurrently traded off with many other physical concerns - will be buried deep within the larger, concurrent physical-design environment.
Almost every EDA vendor listed above boasts a common logical/physical database - but common only within their own tool suites. As we build maturity into these flows, we will need greater plug-and-play interoperability
across EDA vendors than just awkward ASCII file interfaces. A common industry API will be an essential ingredient for advanced flows. However, as the winners are decided in the marketplace, expect a large consolidation in which EDA start-ups are either acquired and integrated, or simply give way to better solutions. Thus, a common API architecture will also benefit EDA vendors to integrate newly acquired technology.
Time's up
The area of timing-calculation correlates well with the current
database situation. When different timing-calculation engines are employed, we merely inflict pain upon ourselves. The need, therefore, for common timing engines - provided in library form from semiconductor suppliers - is clearly important. Hopefully, advanced library standards such as OLA will be supported and adopted in the near future to help reach these critical goals.
With the timing-closure gauntlet thrown down by systems and semiconductor companies, many EDA vendors have aggressively responded with
new options for design flows. While the jury is still out on the mix of approaches that will reign victorious, it's clear that concurrent logical and physical design is the only way forward. This will inevitably impact not only tool and vendor decisions, but force a change in design methodology, as well. Let the games begin.
Contributing editor Steve Schulz is a senior member of the technical staff in Texas Instruments Inc.'s worldwide ASIC division in Dallas. He serves
on the board of directors of VHDL International and is the executive sponsor for the System-Level Design Language.
To voice an opinion on this or any other article in
Integrated System Design, please e-mail your comments to mikem@isdmag.com
Send electronic versions of press releases to
news@isdmag.com
For more information about isdmag.com e-mail
webmaster@isdmag.com
Comments on our
editorial are welcome.
Copyright © 2000
Integrated System Design
Magazine