While most of the ASIC industry is focused on solving timing and congestion problems at the netlist level, LSI Logic has developed and deployed an innovative methodology to resolve these physical problems at their source -- in the RTL code. This methodology can immediately analyze RTL code using proprietary design rules to identify architecture and coding problems as the RTL is coded.
LSI Logic has named this unique methodology Physical RTL Optimization (PRO). Using Physical RTL Optimization, LSI is able to quickly identify RTL architecture and coding constructs that cause layout problems, even before the RTL is synthesized. As a result, schedule risks are minimized and physical design predictability and turnaround time is improved.
DESIGN OPTIMIZATION STRATEGY
The greatest ability to optimize a design is at its highest level of abstraction. Proceeding through the design flow offers decreasing ability for design optimization, and a decreasing ability to resolve major congestion and timing problems.
An effective methodology for timing closure starts with a chip-level architecture that is friendly to physical design. A chip wide bus, for example, causes extreme difficultly in achieving timing closure in higher performance designs. Incorporating switches between bus segments is a commonly used technique to keep individual bus segments short and performance within manageable limits.
Decisions on chip-level architecture have the greatest impact on the physical implementation of the design, and the ability to meet density, performance, and power goals. Improvements of greater than 40 percent can only be made in the chip-level architecture. Unfortunately there is a lack of automated tools and flows in this space today. It is therefore mostly a manual effort, dependent on the combined expertise of the customer and the ASIC vendor to create a chip-level architecture friendly to physical design.
Figure 1 - Decreasing capacity for optimization
RTL coding and architecture
Once the chip-level architecture has been defined, the RTL architecture and the RTL coding are the primary factors that determine the ability to meet performance and density targets in the physical implementation of the design. The focus for design optimization should be on eliminating RTL architectures and code constructs that cause problems in physical design. Improvements of 20-40 percent can be made in the RTL coding and architecture. For an efficient physical implementation to occur, all major timing and congestion problems should be resolved in RTL before handing the design to the synthesis and physical design phases.
Sub-optimal RTL architecture and code can result in several negative consequences. Synthesis may produce a netlist which is unroutable or cannot meet timing. The result is that layout is impossible to complete without RTL changes. Several man-months of effort are wasted as the necessary RTL changes are made by the customer who "finished" with their RTL coding a long time before. The ASIC vendor then needs to merge the changes into the layout database or, depending on the size of the changes, start from scratch with a new layout.
Another common outcome from sub-optimal RTL is that critical paths are ignored in synthesis because the synthesis tool knows it is unable to meet timing on those path. This can result in a huge number of critical paths in physical design, which stresses the layout tools and makes layout long and difficult.
A third common problem from sub-optimal RTL is that complex RTL structures may not be recognized by synthesis. Without proper direction from the user, synthesis tool runtimes can increase dramatically. In some cases, runtimes can be three times longer when using the wrong synthesis approach. Worse than long runtimes, however, synthesis will usually generate a sea of gates for complex RTL structures it doesn't recognize, and then the placement tool distributes those gates across the entire chip making timing closure virtually impossible.
Re-synthesis typically buffers excessively in an attempt to meet timing, which causes gate-count explosion and ultimately a die-size increase. In one case a 500,000-gate design was increased to 1.2 million gates by excessive buffering. Synthesis has shown improvements over the last several years more in the areas of capacity and runtime, and less on the identification of complex structures.
Physical RTL analysis tools that accurately estimate how RTL coding constructs will appear in terms of timing and congestion in the physical design space must be used. RTL code should be analyzed as it is written, not when problems are encountered during the physical implementation or even after synthesis, as RTL architecture and coding problems are not recognizable in a gate-level netlist.
While the creation of a physically-friendly chip-level architecture remains a manual task reliant on the expertise of the designers, automated design flows have emerged in the RTL coding and architecture space and are being used today.
Synthesis, placement, physical prototyping, and physical synthesis
Synthesis remains an important part of the ASIC design flow and must be driven properly. Identifying problematic RTL constructs can lead to greatly improved synthesis. Identifying the true critical paths in a design often reveals the need to change the synthesis approach being used. Only with an understanding of the RTL architecture can a proper synthesis strategy be deployed.
Physical prototyping and physical synthesis (physical planning) are a necessary part of an effective timing closure strategy. Physical planning bridges the gap between synthesis based on inaccurate wireload models and placement and routing, using real physical data. Physical planning is needed for early physical evaluation of a design. It is also an effective means to test timing constraints for accuracy and completeness.
Synthesis and physical planning are limited in their optimization capacity however. Even with advanced logic re-structuring techniques, physical planning tools today typically deliver about a 10 percent performance improvement and a maximum of 20 percent. A low quality netlist, derived from sub-optimal RTL and improperly driven synthesis, can only be improved modestly in physical design. It is imperative that physical planning tools be given a good starting point from which to work. Establishing a good starting point requires identifying and eliminating physically difficult structures from RTL code and properly driving synthesis of high quality RTL.
In practice, physical planning tools are almost always run from a gate-level netlist. They are rarely run from RTL code today. There are three primary reasons for this. First, designers have their synthesis flow so ingrained in their methodology there is reluctance to change. Second, there are severe tool capacity limitations when running from RTL to placed-gates directly. Most importantly, however, running physical planning tools from RTL doesn't give the designer the benefit they really need: the identification of specific problem constructs in RTL code and direction on how to optimize the RTL code to remove the physical implementation barriers. Without this, starting from RTL offers only modest improvements over starting from a gate-level netlist.
PHYSICAL RTL OPTIMIZATION
The focus for design optimization should be on eliminating RTL coding constructs that cause problems in physical design. RTL architecture and coding have far greater impact on timing and routing closure than physical optimization. The majority of design optimization has to occur before handing over the design to the synthesis and physical design tools.
Figure 2 - HDL quality dramatically affects turnaround time (TAT)
LSI Logic's Physical RTL Optimization flow detects and resolves RTL architecture and coding issues that cause difficulty in physical design. The issues are found quickly and early, while the designer is writing the RTL code, before any synthesis is performed.
Physical RTL analysis
Identifying RTL problems with respect to physical implementation in a gate-level netlist is impossible. Understanding the RTL architecture in a gate-level netlist format is impossible. The RTL database is fundamentally changed and unrecognizable. Three lines of RTL code can easily generate hundreds of thousands of gates.
The fundamental flaw of an architectural analysis based on a gate-level netlist is the missing link between the layout database and the RTL code. Standard methodologies used today lack the capability to link physical design directly to RTL source. This limits the effectiveness significantly and results in a sub- optimal solution. LSI's Physical RTL Optimization flow traces physical design problems back to the source of the problem in RTL code.
RTL architecture and coding problems
The way in which RTL code is written plays a major role in the physical implementation of the design. Structures can be written in a way that creates heavy congestion or creates no congestion at all. The RTL coding of a structure has a direct impact on the generated netlist and ultimately the physical layout. Gate counts of identical functional blocks can vary by as much as 100 percent depending on the how the RTL is coded.
Having an even larger impact than RTL coding on physical design is the RTL architecture. Large, problematic structures or many medium sized structures can cause timing closure and congestion difficulty. A central muxing architecture should be modified to incorporate local muxing.
Figure 3 - Central vs. local muxing
Control logic and configuration blocks may need to be duplicated or parsed out to avoid congestion and improve timing. Large register arrays are often far easier to physically implement if they are instantiated as small memories. High fanout nets should be avoided when possible, but in many cases this is not realistic especially those related to clocks. Knowing where high fanout nets exist, however, and accounting for them up-front ensures a smoother path through physical design.
Managing the relationship between many clocks can be extremely difficult in physical design. Oftentimes designers don't realize they have dozens of internally generated clocks in their design. These are a few examples of RTL coding constructs that LSI's Physical RTL Optimization analyzes to optimize RTL code for physical design.
There is sometimes confusion between RTL linting flows and Physical RTL Optimization flows. RTL linting tools are incapable of analyzing the readiness of RTL code for physical design. RTL linting tools are good for enforcing checks on coding style, semantics, comments and re-usability. However, linting tools are incapable of understanding the impact of the RTL code on gate count, timing, or congestion. For that, physical RTL analysis tools that accurately estimate how RTL coding constructs appear in the physical design space must be used.
Gate-level netlist based analysis
LSI first started performing early physical analysis in 1999 using an internally developed tool called LSI Vega. Prior to any rules being developed, LSI meticulously analyzed many designs together with their layout databases to determine which structures or architectures causing congestion or timing problems could be identified in a pre-layout (gate-level netlist) database. As a result of that effort, LSI developed rules within LSI Vega to find potential problem areas. Using LSI Vega together with floorplanning, placement, and physical optimization tools, LSI was able to detect and to resolve critical issues quickly.
Over time similar flows were developed throughout the industry. Most current industry efforts are focused on refining and optimizing these gate-level netlist based flows. However, an RTL architectural analysis based on a gate-level netlist is extremely limited.
One reason is that the architecture database is written and defined in RTL code. By synthesizing the RTL into a gate-level netlist format, the database is fundamentally changed and becomes impossible to analyze effectively.
A second reason is that the way in which RTL code is written plays a major role. Structures can be written in a way that creates heavy congestion or creates no congestion at all. Much of the industry doesn't understand that the RTL coding which is used to describe a certain function or application in a design can have a dramatic impact on the generated netlist. Sub-optimal RTL coding can result in a netlist being twice as large as necessary, can cause synthesis runtimes to be days instead of hours, and can create major congestion problems.
However, the primary drawback of an architectural analysis based on a gate-level netlist is the missing link between the layout database and the RTL code.
THE LSI "PRO" OPTIMIZATION FLOW
LSI PRO is comprised of three major components: LSI PRO Design Rules, LSI PRO Technology Libraries, and the LSI PRO Design Flow.
Figure 4 - Physical RTL Optimization Flow
LSI physical RTL optimization design rules
LSI has performed RTL analysis on over 40 designs, and over 100 designs including revisions of designs. To create a comprehensive set of physical RTL rules, as was done with LSI Vega back in 1999, LSI meticulously analyzed many designs together with their layout databases.
This time, however, the goal was to determine which structures or architectures causing congestion or timing problems could be identified in an RTL database, not a gate-level netlist database. As a result of this effort, LSI developed 20 physical RTL rules to find potential problem areas in an RTL database. As new designs are analyzed as part of the standard design flow at LSI Logic, rules are updated, refined, and added as appropriate.
- Missing Clock Information
- Unconstrained I/Os
- Gated Clocks
- Mixed Edge Clocks
- Critical Mux Structures
- Gated Reset
- Shift Registers
- Unregistered Outputs
- High Fanout Nets
- Asynchronous Loops
- Asynchronous Interfaces
- Multiple Driven Nets
- Logic Cone (FanIn)
- Logic Cone (FanOut)
Timing and Congestion Rules
- Critical Paths
- Local Congestion
- Global Congestion
- Large Arithmetic Structures
Physical RTL optimization flow
The design is read into Tera Systems' TeraForm in either Verilog or VHDL format. The physical RTL rules to be evaluated are selected from the LSI PRO Rule Checker. Typically all rules are selected for the first physical RTL analysis.
Figure 5 - LSI Logic PRO rule checker
The design is then synthesized to TeraGates and standard cell gates. If the selected rules require layout information, then the design is floorplanned, placed and routed. Next the design is analyzed for conformance to the design rules selected. For timing and congestion rule violations, a cross-probing view is provided of both the problem area in layout and the corresponding RTL code causing the problem.
Figure 6 - TeraForm cross-probing view
The LSI Logic ASIC Customer Engineer (ACE) performs an analysis of all the design violations, generating a summary report. The report is typically a 2-3 page document that provides a detailed explanation of the issues identified as critical for layout implementation (rule violations), the impact of each rule violation, and the required and recommended optimizations to resolve the rule violations. Recommended optimizations include improved RTL partitioning, global net count reduction, RTL coding changes, and improved synthesis approaches. The LSI Logic ACE explains the alternatives to the customer so that the best possible implementation can be selected.
LSI PRO technology library
The LSI PRO libraries are a key technology in the LSI PRO flow. The libraries are in TeraGate format, which defines functions at their highest useful level. Mapping the design RTL to TeraGates enables significant design simplification. When viewing a schematic representation of the design, each TeraGate appears as a single symbol.
The first step in generating the LSI PRO technology libraries is the generation of RTL code for each TeraGate. TeraGates are complex gates such as adders, muxes, dividers, etc., and there are over 1500 RTL TeraGates currently. The RTL TeraGates are synthesized into each of the LSI Logic technology libraries (such as GflxP, GflxD). Each TeraGate then goes through LSI Logic's FlexStream Design System layout flow. Finally a library compilation of the TeraGates is done, converting the layout database to the TeraGate format representation of area and timing constraints. With extrapolation (converting a buffer to an inverter) the final resulting library is about four thousand TeraGates.
CUSTOMER ENGAGEMENT MODEL
Either LSI Logic can perform the Physical RTL Analysis on the customer's design, or the customer can run the analysis themselves. If LSI is running the analysis tools, the initial RTL delivery is made by the customer to LSI. This is a change from when customers delivered only a gate-level netlist and timing constraints, to now including RTL code.
Reports from the RTL analysis are analyzed and a report summary is generated by LSI. A formal RTL review is held between the customer and LSI, where LSI presents the summary report with requirements and recommendations for RTL changes. Required RTL changes would include problems so egregious that it simply makes no sense to continue into layout unless the problems are resolved.
Recommended RTL optimizations are less drastic problems, but still likely to cause difficulties in layout. In some cases it may make sense to leave the RTL problems unresolved and be prepared for them in layout. An action plan is agreed upon by both parties. After the changes are made by the customer, a second Physical RTL Analysis is performed by LSI and a second RTL review is held. If the RTL is shown to be satisfactory at this point, the design proceeds forward into synthesis and physical planning. Otherwise, additional RTL optimizations are made by the customer and another analysis and review follow.
An alternative engagement model option is when the customer chooses not to deliver RTL code to LSI. The customer has the option to run the RTL analysis tools themselves. In either scenario, the reports from the RTL analysis are analyzed by LSI and a report summary is generated by LSI. The formal reviews, in which LSI provides required and recommended RTL changes, and an action plan agreed upon, are always conducted as well.
In an early example engagement, LSI's physical RTL analysis found four 1024:1 muxes and high fanout select nets in the customer's design. LSI recommended to the customer to restructure the muxes into a localized muxing scheme and to create duplicate control logic for the high fanout select nets. The customer modified the RTL and used a different synthesis approach to address the muxing problem, and changed the RTL to address the control logic problem.
LSI created an optimized placement knowing how the modules interfaced to each other. Several floorplans had been created for this design in an attempt to resolve congestion problems. Every trial floorplan had a large congested area, with each new floorplan the hot-spots moved but never went away. Once the RTL, synthesis approach change, and placement optimizations were made, the congestion was eliminated and the layout proceeded without further delay.
A key development needed is the merging of RTL architecture definition, RTL coding, and physical RTL optimization with physical evaluation and physical planning. The quality of rule-based physical RTL optimization flows is dependent on the completeness of the rule set. Furthermore, it is impossible to detect and thus resolve every physical design problem in every design with a rule-based approach.
Physical prototyping and physical synthesis tools, on the other hand, are gate-level netlist based. They are incapable of tracing RTL architecture and coding problems back to the source of the problem in RTL. The solution required is an integration of physical RTL analysis with physical planning tools. This would put the capability into designer's hands to simultaneously assess the feasibility of, plan for, and optimize the physical implementation of a design as it is being developed, from defining the RTL architecture and coding the RTL through synthesis and physical planning.
Jeff Vanderlip is director of marketing for ASIC Technical Marketing at LSI Logic. He joined LSI Logic in 1996 from Pacific Sierra Research, where he held an engineering position before becoming manager of optimization tools. He was previously an engineer at other companies.