Designers involved in EDA or ASIC design can testify to the changing landscape for synthesis-based timing closure. The changes have been brought about by the increasing size and complexity of today's chips, with chip designers needing greater capacity in physical synthesis, and much stronger focus on interconnect.
A bit of history
Early on, synthesis tools had a very limited view of wire loading; that was not a problem since most of the delay from the cell/wire combination came from the cell. Eventually software developers concluded that wire fanout was proportional to the delay associated with the load of a specific wire, so elaborate calculation was not needed, and timing closure could focus on cells only.
Then came the emergence of 0.18 micron technology, where a significant portion of the delay came from the topology of the wire. Floorplans and placement of cells drastically affected the path timing. (Some of these effects were seen at 0.25m, especially for higher performance designs.) At 0.18m, the traditional wire load models (WLM) broke down. Solutions consisted of custom wire load models, enhanced floorplanning, and physical synthesis.
All of these techniques worked, some better than others, but no one solution adequately addressed all types of designs. Custom wire load models failed often as the placement engines often yielded unpredicted results, and larger designs and complicated floorplans produced inferior results compared to the other two available techniques.
Physical synthesis logic restructuring based on placement worked very well over a broad range of designs, and became the dominant block-level timing closure technology. However, capacity was limited and it often created un-routable designs due to its lack of real wire topology prediction.
To address these shortcomings, designers adopted silicon virtual prototyping, which worked well on large designs, identifying timing and routability issues, and often making physical synthesis work better. The combination of silicon virtual prototyping and physical synthesis remains the dominant SoC design strategy today.
Physical synthesis tools work from both RTL and gates, but most usage today is from gates. Some designs show promise from RTL, but most cannot match the performance of WLM-based logic synthesis followed by gate-to-placed-gate optimization in physical synthesis. Even RTL-driven flows today rely on WLMs for at least an internal first pass; no physical synthesis tool truly performs pure synthesis from RTL while concurrently placing the design. The question remains: What tools or techniques will work best on nanometer SoC designs?
Timing closure for nanometer designs
Timing closure is a problem from microprocessor design down to the smaller micro-controllers and ASICs. So which solution will you use on your next design? Most design groups start with techniques employed in previous designs and adapt portions of new technology from the EDA vendor best able to enhance design, design flow, or performance. Usually, that technology is adopted after significant quality of results improvement, hefty runtime or capacity improvement, or significant flow simplification.
In the current marketplace, timing closure is under tremendous pressure, basically from Moore's Law. Why? In 2004, over half of all SoC design starts will be at 130 nanometers or below, including substantial numbers at 90nm.
This brings two major challenges to timing closure. First, at 130nm and below the effects of wires on timing are so complex that floorplanning, physical synthesis and routing must work together in different ways than they did before. And second, designers are taking advantage of nanometer silicon to build chips much larger than the original physical synthesis developers ever envisioned.
As a result, designers using the first generation of physical synthesis tools are faced with several pressure points in their timing closure flows:
- Physical synthesis run times today are unbearable. This is caused by designers' need to handle larger blocks, and growth in the complexity of reaching convergence.
- Real wires are late coming into the picture. In terms of the flow, real-wire topologies are not realized until after timing is "closed." The advantage of using wires to drive mapping is not utilized.
- Design closure is more than just timing closure for nanometer designs. Physical synthesis solutions today do not close congestion, power, area, signal integrity effects. The designer is left with multiple iterations post physical synthesis when it comes to complete design closure.
These problems must be solved if designers are to reach their timing closure goals for the nanometer SoC world.
Physical synthesis; the second generation
Upcoming solutions have the right wire-topology modeling techniques, the right restructuring capabilities (from a high level perspective), and the right physical guidance (wires) to outperform traditional physical synthesis flows. A solution that can give real wire accuracy and optionally the flexibility of completing all or some of the routes to sign-off quality is fundamental.
Ideally, interaction between the logic synthesis and the physical implementation would be contained in one environment. This will allow synthesis-based optimizations, including restructuring of the logic and optimizations of the critical paths, to occur with real-wire topologies, and the associated effects can be considered simultaneously.
The first issues tackled in aggressive timing-closure flows are generally constraint quality and wire congestion (including macro placement and power planning), not timing optimization itself. If optimization is performed on a design which is not physically feasible, timing cannot truly be closed. Both these issues can be addressed through early wire creation.
The basis for wire based convergence is the creation of a physical prototype of the design. By creating a full-chip physical prototype, the design team can immediately validate the wires. This routing information suffices to recognize routing congestion and can be used as a basis for generating realistic timing data.
This enables the designers to very rapidly determine whether or not their targets and constraints are realistic, and if and how the design can be made to meet them. Designers evaluate many implementations of their chip and receive quick feedback on the best tradeoffs. The creation of the prototype also enables engineers to create realistic timing budgets for all sections of the chip.
The physical information that underlies the creation of timing budgets is what makes the timing budgets realistic: they result in a physical implementation that reaches closure without multiple iterations. The physical prototype can also provide critical information such as the chip size and aspect ratio.
One other, sometimes-overlooked, aspect of timing closure is that physical implementation is about preserving design quality, not creating it. Wires may dominate path delay, but chip performance is fundamentally about architecture.
While a poor layout will compromise a good design, a brilliant layout will not speed up a slow one. Initial logic architecture is key. New synthesis algorithms, which better guide the structuring process, create a better starting point for physical optimization. This means less effort in optimization and better run times and overall performance. First and foremost: if you want better timing, get better synthesis.
Once good logic (or RTL and good synthesis technology) and a high-quality design plan are achieved, physical synthesis can proceed. The new physical synthesis solutions will be a combination of prototyping with physical synthesis in a single design environment. This will greatly reduce the current dependence on wire load models as accurate wire information will be available from very beginning of the timing closure cycle.
Now let's look at physical synthesis capacity. 90 nanometer silicon can easily support designs of over 30 million gates. Yet the first generation of physical synthesis could handle 1-2 million gates flat. In such a design flow, a 50 million gate chip must be partitioned into as many as 50 separate blocks; this makes timing, signal integrity, and power integrity nearly impossible to close at the top level (figure 1).
Figure 1 Block-level capacity impacts SoC scalability
Most designers would like a half dozen or so blocks at the top level. In order to support modern nanometer SoC designs, physical synthesis really needs to handle blocks of 8-10M gates flat, in manageable run times (generally under a day, and ideally overnight), and without compromising QoS (Quality of Silicon).
Figure 2 First-generation, locally focused timing closure
First generation physical synthesis also has a very limited scope, focusing on a small number of cells at once. Optimizations transform one path at a time: a slow, memory-intensive, and capacity-limited operation (Figure 2). These solutions squeeze the last bits of performance out of a locally optimal structure but do not adequately consider possible impacts on the rest of the design.
Refinement into local minima may actually reduce the likelihood of finding an overall optimal result. If this happens, then the optimization will take much longer to reach the design targets, compounding the run time and capacity issues, and may simply not converge at all.
An alternate approach is to focus on entire timing paths and groups of paths over the entire design. Ideally, this will give the user a solution that is "globally focused" on all paths, and timing closure over the entire design, instead of only portions of timing-critical paths. These solutions will better avoid the local minima potholes that plague traditional physical synthesis.
Second generation physical synthesis solutions will optimize many paths concurrently as shown in Figure 3, converging quickly to a globally correct solution and accommodating the large designs enabled by availability of newer generation process technologies. The physical synthesis process will be capable of globally optimizing timing, signal integrity (SI) effects, power, area, congestion and wire length. Memory and runtime efficient architecture, integrated directly into the virtual prototyping system, will be a necessity.
Figure 3 Second-generation (globally focused) timing closure
At nanometer geometries, timing closure needs to consider crosstalk between wires. The most effective approaches include a mechanism for estimating these problems early in the flow, using real physical wire information as the basis. Detailed routing information is needed in order to perform concurrent SI and timing analysis and optimization.
The earlier in the design flow designers have access to this information, including potentially in prototyping, the more flexibility and liberty they have to modify or correct various design and connectivity parameters. The detailed router employed must therefore understand the impact of SI issues on the chip's functionality and timing, and take appropriate actions to prevent and correct issues as they arise.
An effective detailed router must be able to employ multiple techniques on-the-fly to address SI issues, such as wire shielding, wire spacing, buffer insertion, and driver resizing. Manufacturing objectives such as wire-spreading, double-via insertion, antenna fixing etc, must also be considered. Some level of physical synthesis transformation needs to continue to be available even after routing, in order to correct any late violations.
Design closure priorities for nanometer designs
When the timing closure solution has completed, the user should be left with a design with "real" wires extracted and timed with a high degree of confidence. As a result, the physical synthesis process can't really end until after detailed routing, and accurate validation that the design meets its targets. Signoff-quality extraction, delay calculators, and static timing analysis will ultimately measure this, and therefore need to be a consideration in the timing closure solution.
At this time, using signoff-accuracy analysis during much of the timing closure process is not practical due to runtime. Reduced-precision (ideally just more pessimistic) versions of the signoff engines can be used during the early timing closure and synthesis stages, but the signoff-quality analysis needs to be available on demand, for verification and even late-stage optimization.
This implies full nanometer effects, at least at the signoff-quality level. Extraction techniques need to consider the latest in topology estimation, optical parasitic correction, dishing, and erosion. Delay calculators need to consider noise and cross coupling-induced glitches.
Power analysis, including electro-migration and IR drop, should also be completed, and the effects of voltage drop should be considered during the delay calculation since gate performance degrades significantly with voltage drop. This issue is amplified by the fact that newer processes are operating at even lower voltages and the transistors are more sensitive to voltage changes.
Figure 4 Design closure priorities for nanometer designs
Overall, the notion of priority is paramount. Designers today are faced with multiple competing priorities before tapeout (figure 4). At some level, all issues must be considered concurrently throughout the design process; however, deciding which issues receive more or less relative emphasis at different phases of the design is central to good convergence strategy.
For example, nanometer convergence strategy will usually prioritize congestion and power planning before prioritizing timing, and basic timing before crosstalk. This evolution of emphasis will also help guide when to use the signoff quality analysis and when not to.
The way success is measured is also changing. In order to assess whether or not a design meets its targets, it will not be enough simply to quantify the performance of a netlist or a set of placed gates. At nanometer geometries, designers require routing and detailed extraction results before considering the QoR (Quality of Results) or timing of the design. This wire-inclusive QoS (Quality of Silicon) metric must be used as the measuring stick.
At 0.13m, most of the process layers are related to interconnect, and therefore most defects occur in interconnect wires. There is a need to measure real wires before making any determination of design closure. DFT (Design For Test) structures and patterns will be different from traditional DFT insertion because it will consider patterns traversing interconnect wires instead of only propagating through the sequential devices. There is a need for a different model to detect the presence of the defect.
In a world of ever-increasing chip complexity and shrinking process sizes, timing closure undoubtedly will become complex and inevitable. No solution will prevail unless it provides maximum flexibility without compromise for inferior technology, and is simultaneously nimble enough for a broad range of design flows. The EDA vendor supplying the most versatile solution with the best technology for a variety of designs will be best poised to help customers bring successful and timely designs to market.
Timing closure is not a tool. It is a complex process that will be addressed in numerous ways in the future. Vendors with an eye on the future are actively studying the usage model for technologies, allowing maximum flexibility to help customers succeed.
Ashutosh Mauskar has over 12 years of experience in the EDA industry. He has been with Cadence for past 4 1/2 years. His focus is on Digital IC implementation. Currently, he leads a marketing team for the Cadence Encounter Digital IC Design Platform product line. Prior to joining Cadence, Mauskar held several key marketing and technical leadership roles at Synopsys.