Design Article
Comment
kinnar
The electronics design methodology always keeps changing at the technology ...
Conquering behemoth designs
Andy Inness, Mentor Graphics
6/11/2012 10:10 AM EDT
If the IC design trends of the past 20 years serve as an example, we will likely be required to implement a trillion transistors or more on a chip in the next 10 years. Even at 20nm, chip sizes touching billions of transistors present the age old, perpetually unanswered problem of how to most efficiently implement a design of staggering magnitude. Do you do it flat or hierarchical? Are your decisions based on the current tool capabilities or limitations? Is the design being implemented across geographies or locally? Is it a complex SoC or an ASIC? Do you have any 3rd party IPs or analog components? Is the market window three months or three years away? Do you have a fixed area or power budget that must be met?
Tools and methodologies for the physical implementation of these big designs—from synthesis through place and route, verification, and DFM—have typically used either a purely flat implementation, or a hierarchical implementation. Both the approaches have advantages and disadvantages, summarized in Table 1, but have worked reasonably well until the recent move to 20/14nm. At these advanced nodes, the current tools and methodologies seem to be running out of steam and the design community is looking for a solution that addresses the performance, complexity and time-to-market requirements while also handling large amounts of data.

In this article we discuss some strategies and tool requirements for physical implementation of such large and complex semiconductors. We make a case for a hybrid design methodology--a pseudo-flat flow that uses existing tools, technology, and design team infrastructure to enable better results in less time than the traditional flows.
Flat design flows
Flat flows have been in vogue since the early days of IC design and for good reasons – it is a straightforward flow that provides the best QoR (quality of results) in terms of design utilization and performance. The full design is implemented as one entity and typically owned by one engineer. Figure 1 illustrates how a design is viewed in a flat flow.

The flat flow starts with pad placement followed by macro placement and fast prototyping. Once the power and ground grid is inserted, the design goes through a few iterations of physical synthesis, cell placement legalization, and then through clock tree synthesis (CTS) and optimization. The next step is detail routing and more optimization. Finally, design closure, which takes into account signal integrity and lithography design variability affects. At this point, all the requirements of the design, such as power consumption, timing, performance, area, and manufacturability, must converge. That is, you must meet all those requirements.
This flow works seamlessly as long as the design sizes are reasonably small, say less than 4 million gates. If the design sizes are big (> 5 million gates) the benefits and simplicity of the flat flow are offset by long turn-around-time (TAT) and multiple iterations before design convergence. Yet, even as design sizes are bursting the seams of many design tools, there is still a strong push to implement designs whenever possible using this flat methodology. This is because of the overall silicon efficiency that results from having all the design data available at once. Basically, in a flat flow, the design tools can make the best tradeoff between design density and performance.
Hierarchical design flows
The divide-and-conquer approach of the hierarchical flow is better suited for designs too big to implement flat, or if the design is implemented by different teams in one or more geographies. The goal of the hierarchical design methodology is to break down the design into smaller blocks, implement them as in a flat flow, and then assemble them together as blocks at the top level. Figure 2 illustrates a typical view of a hierarchical design flow.

Two general hierarchical approaches are commonly used. The first keeps channels between the blocks, through which the top-level block interconnects run; the second used an abutted-block approach. Each has advantages and disadvantages. Top-level channels with routing between the blocks results in a loss in area utilization, and an increase in die size to accommodate this top-level routing. Often, channel designs don’t give as good clock speed and skew as you get with the flat approach as well. There can also be more CTS insertion delay when the clock trees must go around the different blocks in the top-level channels. This can result in overdesign at the block level, and design teams rarely go back and recover the wasted area and power that was added at the block level to enable top-level timing closure.
With abutted blocks, rather than using channels for top-level routing, the signals are pushed into the blocks themselves. It typically solves the die area problem by eliminating channels. The abutted style offers its own challenges in getting these feedthrough signals implemented properly without hurting overall design closure. Abutted style also does not solve the issue of overdesign for inter-block timing and power budget closure. This flow can also pose major challenges for top-level CTS implementation when no top-level channels are present in the design.
A general challenge with both hierarchical approaches is loss in accuracy due to model abstraction that uses models (either ILMs or LIB) of blocks to overcome tool memory limitations. Top-level optimization with these abstracted models leaves performance on the table. Top-level chip assembly poses its own set of problems in terms of timing closure because of poor block-level budget estimates and the increasing number of mode/corner scenarios. Estimating clock resources during design planning, optimal pin assignments for each module, generating realistic block-level timing budgets and top-level timing/SI closure are other challenges that need to be considered as part of the hierarchical flow. This again can result in wasted design space if too much area is reserved for this step.
Tools and methodologies for the physical implementation of these big designs—from synthesis through place and route, verification, and DFM—have typically used either a purely flat implementation, or a hierarchical implementation. Both the approaches have advantages and disadvantages, summarized in Table 1, but have worked reasonably well until the recent move to 20/14nm. At these advanced nodes, the current tools and methodologies seem to be running out of steam and the design community is looking for a solution that addresses the performance, complexity and time-to-market requirements while also handling large amounts of data.

Table 1. A summary of the advantages and disadvantages of hierarchical and flat flows.
In this article we discuss some strategies and tool requirements for physical implementation of such large and complex semiconductors. We make a case for a hybrid design methodology--a pseudo-flat flow that uses existing tools, technology, and design team infrastructure to enable better results in less time than the traditional flows.
Flat design flows
Flat flows have been in vogue since the early days of IC design and for good reasons – it is a straightforward flow that provides the best QoR (quality of results) in terms of design utilization and performance. The full design is implemented as one entity and typically owned by one engineer. Figure 1 illustrates how a design is viewed in a flat flow.

Figure 1. A view of a design in a flat design flow. All the design information for blocks and the top-level logic is available to the place and route tools. This is good for creating block budgets, assigning pins, and optimizing, but bad for memory footprint, runtime, and ECOs.
The flat flow starts with pad placement followed by macro placement and fast prototyping. Once the power and ground grid is inserted, the design goes through a few iterations of physical synthesis, cell placement legalization, and then through clock tree synthesis (CTS) and optimization. The next step is detail routing and more optimization. Finally, design closure, which takes into account signal integrity and lithography design variability affects. At this point, all the requirements of the design, such as power consumption, timing, performance, area, and manufacturability, must converge. That is, you must meet all those requirements.
This flow works seamlessly as long as the design sizes are reasonably small, say less than 4 million gates. If the design sizes are big (> 5 million gates) the benefits and simplicity of the flat flow are offset by long turn-around-time (TAT) and multiple iterations before design convergence. Yet, even as design sizes are bursting the seams of many design tools, there is still a strong push to implement designs whenever possible using this flat methodology. This is because of the overall silicon efficiency that results from having all the design data available at once. Basically, in a flat flow, the design tools can make the best tradeoff between design density and performance.
Hierarchical design flows
The divide-and-conquer approach of the hierarchical flow is better suited for designs too big to implement flat, or if the design is implemented by different teams in one or more geographies. The goal of the hierarchical design methodology is to break down the design into smaller blocks, implement them as in a flat flow, and then assemble them together as blocks at the top level. Figure 2 illustrates a typical view of a hierarchical design flow.

Figure 2. A view of a design in a hierarchical design flow. Tool capacity and runtime issues are better managed, and ECOs are improved. However, they are less efficient for die area utilization and rely on highly accurate models for optimization.
Two general hierarchical approaches are commonly used. The first keeps channels between the blocks, through which the top-level block interconnects run; the second used an abutted-block approach. Each has advantages and disadvantages. Top-level channels with routing between the blocks results in a loss in area utilization, and an increase in die size to accommodate this top-level routing. Often, channel designs don’t give as good clock speed and skew as you get with the flat approach as well. There can also be more CTS insertion delay when the clock trees must go around the different blocks in the top-level channels. This can result in overdesign at the block level, and design teams rarely go back and recover the wasted area and power that was added at the block level to enable top-level timing closure.
With abutted blocks, rather than using channels for top-level routing, the signals are pushed into the blocks themselves. It typically solves the die area problem by eliminating channels. The abutted style offers its own challenges in getting these feedthrough signals implemented properly without hurting overall design closure. Abutted style also does not solve the issue of overdesign for inter-block timing and power budget closure. This flow can also pose major challenges for top-level CTS implementation when no top-level channels are present in the design.
A general challenge with both hierarchical approaches is loss in accuracy due to model abstraction that uses models (either ILMs or LIB) of blocks to overcome tool memory limitations. Top-level optimization with these abstracted models leaves performance on the table. Top-level chip assembly poses its own set of problems in terms of timing closure because of poor block-level budget estimates and the increasing number of mode/corner scenarios. Estimating clock resources during design planning, optimal pin assignments for each module, generating realistic block-level timing budgets and top-level timing/SI closure are other challenges that need to be considered as part of the hierarchical flow. This again can result in wasted design space if too much area is reserved for this step.
Navigate to related information


kinnar
7/29/2012 3:04 PM EDT
The electronics design methodology always keeps changing at the technology changes I am seeing this kind of similar methodical changes since the transistor based designs. But ultimately when a design needs to be a successful design it normally gets evolved from scratch and then gets passed through all the possible methodologies before getting proved as successful design. The exceptions will be always there for every generalized saying.
Sign in to Reply