The move to system-on-chip (SoC) designs is expected to dramatically increase chip sizes from the already complex 10 million to 20 million transistors to more than 100 million transistors in fewer than five years. EDA companies and semiconductor houses alike are challenged to find viable design methodologies that can handle these large sizes and aggressive performance targets while getting designs to market faster, more reliably and in a much more predictable manner-factors that are key to staying ahead of or even on par with the competition.
Hierarchical or block-based design methodologies, which are based on the traditional divide-and-conquer paradigm, are quickly being recognized as one of the primary mechanisms to realize these large and complex multimillion-gate designs. However, these methodologies have drawbacks associated with design closure and other issues that will require designers to use new approaches to alleviate problems, and to leverage these approaches to their fullest advantage.
In theory, block-based design has a number of advantages. It permits the design to be broken into smaller, manageable pieces, each of which can then be designed in parallel by multiple teams spread across the globe. It is also well-suited to early "what-if" explorations of alternative floor plans in the search for the best results to meet the required design objectives. In addition, it provides early feedback into the feasibility of timing, area and power constraints. And it enables reuse of firm, soft and hard intellectual-property (IP) blocks. Yet, by their very nature, hierarchical approaches lead to local decisions that can often yield suboptimal results and are prone to design convergence problems.
The real question then is: Is hierarchical design a want or a need? Ideally, the block-based approach should be used because of its methodology merits and not merely because of tool limitations. In the past, most block-based flows have been designed to accommodate the latter, leading to cumbersome and inefficient assemblies of point tools. In the physical-design space, the recent advent of place and route tools implemented with multithreaded 64-bit architectures has dramatically increased the size of designs that can be realized in a flat manner.
Although this does address to some extent the exploding size of today's designs, it is still not sufficient to cope with tomorrow's. In addition, capacity limitations also exist in the other tools-synthesis, extraction, verification and so on-which comprise a complete RTL-to-layout flow. When it comes to SoC design and IP reuse, clearly a hierarchical methodology is a necessity that is here to stay.
Traditional hierarchical meth- odologies start at the chip level and use partitioning to divide the design into blocks based on logical and physical hierarchies. Then, budgets of area, shape (aspect ratio), timing and power are allocated to each block based on global chip-level constraints. Additionally, locations of pads on the chip perimeter and pins on each block are optimized based on the interconnect structure between blocks. Power, clock and bus planning are also typically done at the top level and pushed into the blocks as pre-routes. These blocks, along with their boundary conditions and budgets, are then passed to place and route tools. Finally, when blocks are implemented, they are assembled with floor planning and top-level routing to realize the overall chip.
The most challenging aspect of such a hierarchical approach is design convergence. Decisions made on block size and shapes and their timing, power and area budgets are frozen early on in the process and prohibit reaching globally optimum results. Further, if these budgets and constraints are inaccurate or inappropriately proportioned, blocks may be unrealizable, resulting in costly iterations through the entire process. Using the new generation of place and route technology in such a flow, experiments have shown that the internal clock frequency of the blocks can easily be increased and that the clock speed is limited by the I/Os between the blocks. This shows that old-fashioned partitioning is clearly suboptimal.
Perhaps most difficult is achieving timing closure, which depends heavily on the top-level interconnect modeling and delay estimation. This problem is becoming particularly severe in current and future deep-submicron processes (0.18 micron and below) since interconnect can no longer be ignored because of its increased impact on performance, timing, power and area. Even if an overall planning of the wires leads to a satisfactory distribution, a single long wire not routed according to plan can prevent closure. Statistical wire load models that were sufficient in the past are largely inaccurate; new and more accurate models have to be used. Chip-level interblock wires must be accurately planned to minimize and guarantee the delay through these long wires, and repeaters must be inserted accordingly. Pin locations and timing budgets for each block are based on these wires; because of this, blocks are highly sensitive to these top-level decisions. Therefore, we need a well-defined methodology for defining top-level constraints and passing them down to lower-level blocks.
The number of blocks at the chip level is crucial to the chip-level interconnect planning and, in turn, the final design quality. Having too many top-level blocks can lead to numerous local decisions that can overconstrain the problem and produce low-quality, suboptimal results. Don MacMillen, vice president of advanced technology at Synopsys Inc. (San Jose, Calif.), recently showed that the portion of the design solution space explored when using just four blocks is already less than 1 percent of the total space available using flat approaches and decreases exponentially with the number of blocks.
To efficiently explore the numerous permutations of the floor plan, block size, and shapes and performance budgets not only requires complex partitioning and floor planning tools, but can invariably lead to numerous design iterations and problems with design convergence.
In addition, there will be a significant impact to the chip size due to the additional area (for example, power ring) devoted to the boundary of all the blocks. On the other hand, too few chip-level blocks can result in large blocks with several thousand and perhaps even millions of gates. This makes it even more difficult to accurately model chip-level routing, which is so heavily dependent on gate placement inside each block-and which is unknown in the partitioning stage.
Large blocks also warrant physical place and route tools that can efficiently implement them with quick turnaround and high design quality. Consequently, any hierarchical approach must carefully plan the number of levels of hierarchy as well as the number, size and nature of blocks at each level.
Fundamental to any block-based design is a bottom-up floor planning and assembly of blocks that requires accurate abstraction, characterization and packaging of intellectual property (IP). Since blocks could be in various stages of implementation-ranging from "soft" RTL-level descriptions, to partially placed "firm" IP blocks, to fully placed and routed "hard" blocks-supporting incremental block/chip assembly is imperative. Engineers must be able to see what the chip will look like without having to wait until all the blocks are complete before integrating them. As each block gets further refined, its updated logic, timing and layout data has to be plugged into the chip level for continuous feedback on the global constraints. This helps to identify any problems up-front in areas including timing at the block boundaries, die size and so on, and allows the designers to determine early whether or not they must reallocate resources.
Incremental methodology is also needed to support engineering change orders. ECOs are necessary in hierarchical approaches not just to support design changes, but also to iteratively explore different ways in which budgets and constraints are proportioned between blocks. Since design information and constraints constantly change, designers should be able to annotate these changes into the chip-level floor plan, translate them into new constraints and interface for each block, reimplement the blocks that have changed and then quickly reassemble the chip. This, of course, is easier said than done. Among other things, it requires physical synthesis tools that can take the ECO as it applies to each block and reimplement the block quickly and predictably. Therefore, place and route tools also need to be incremental so that they can reuse their prior processing as much as possible to implement the ECO without having to restart from scratch.
Block-based design, although necessary, has its own share of problems with design convergence and quality, both crucial to success in the marketplace. One way to alleviate some of the drawbacks is to implement partition-driven block-based flows inherently within the gate-level place and route tools. Here, partitioning is used merely to overcome the capacity limitations of backend global and detailed routing stages. However, placement, logic optimization and timing analysis within each partition take in-to account global chip-level timing, routability and other constraints. Therefore, partitioning merely creates a soft floor plan whose blocks have some physical boundaries but are subject to global performance requirements.
Since no time budgeting is required and chip-level constraints do not have to be translated to block-level, performance modeling between the top and bottom levels of the hierarchy remains both continuous and consistent. This would overcome to a large extent the chicken-and-egg problem of time budgeting and ensure that global constraints are met rapidly and with few iterations, if any.
To implement this solution, place-and-route tools will need several key capabilities. They should be able to place and globally route modules larger than gates to be able to handle the large chip sizes. They should also have built-in timing, wire-length and congestion-sensitive logic optimization to perform buffer and repeater insertion at the chip level.
Incremental analysis of timing, power and area requirements is required to get continuous feedback into the design's progress. Timing analysis and extraction should be both fast and accurate to handle the large design size and chip-level constraints. Global and detailed routing capabilities should be able to route wires within each block while considering pin locations in other blocks to produce a globally optimal solution.
A new approach to hierarchical design is imperative to realize the advantages of hierarchy and yet overcome many of its shortcomings. It is no longer enough to partition a design into blocks and then individually implement each block with its top-level constraints and budgets using place and route tools.
A much tighter integration of planning, partitioning, place and route and chip assembly is now required to meet the challenges of modern day multimillion-gate, ultradeep-submicron system-on-chip designs.