The router products market where the average throughput growth is an incredible 2.2X every 18 months is moving faster than the formidable Moore's Law. To stay competitive, design teams must do more than just ride the wave of new silicon process technology. These chip designs represent the bleeding edge in design complexity. Chips with more than 250-million transistors have been fabricated. Still bigger chips are under design.
One of the most critical decisions for any design team is that of choosing between a customer owned tooling (COT), an ASIC or a hybrid model. The ASIC model, defined as an interface between the design team and the silicon supplier, is at the netlist level most likely with a floorplan. In a COT flow, the design team performs all of the design ( synthesis, placement, routing) and verification steps to produce a GDS-II file for mask generation. A new hybrid model is being used where design teams perform gate placement, in some cases full legal placement, and in some cases fast estimated placement. This hybrid model has been employed to mitigate some of the iteration issues associated with design convergence when logical and physical design are separated and serialized.
In the highly competitive networking space, there is no choice to make. COT is the model of choice for high-performance products that have demanding time-to-market requirements. The COT model affords a number of advantages, the most fundamental of which is control. First and foremost, design teams make their own decisions about tools, flows and methodology. In ASIC projects, a host of rules and caveats play an inhibiting role in the application of best-known tools and practices in favor of the "tried-and-true" methods that have years of service.
COT design flows permit designer intervention at every stage in the process, without a huge time penalty. In an ASIC flow, data must be passed between organizations and each party performs the intervention for their part of the design flow for example, RTL fixes are done by the design team; antenna rule violations are fixed by the ASIC vendor. This limits the timing and the nature of intervention options available to the design team. The COT flow is used to get and maintain full control over all options, and has shown to deliver the highest performance with smallest die size and power.
Typical estimates are that COT data pathways have 30% to 50% faster performance and the die size is 25% to 50% smaller when using a COT flow versus a traditional ASIC flow. These numbers are really the bottom line in the COT versus ASIC decision. Chip performance equals product value, and chip size equals product cost in the router market.
However, one should be aware that there is a large, incremental cost associated with COT flows. COT requires more tools, and more designers to run these tools. Full-blown, backend design capabilities cost $1 million or more per seat. Experienced engineers for running these tools can be difficult to find.
But in the networking market, these additional costs are insignificant when compared to the stakes. COT designs also offer complete control over the design schedule. There is no ASIC vendor to throw the netlist over the wall to and hope that physical design goes well. Success and failure rests in the hands of the design team. Further, the COT design style permits innovation in design methodology. For instance, parallelizing the front-end and backend processes to create a pipelined implementation flow allows for continuous trials of the entire flow.
The basic philosophy used to craft a complex, high performance IC design methodology should be to create an environment where designers can expend their efforts exclusively on the 10% of the design that is difficult, while the remaining 90% is serviced by a highly automated flow. The 10% part of the problem demands control.
To achieve speed in ICs, pipelined architectures are often employed. The same throughput advantages found in hardware pipelines can be gained in the hardware design process. To make the design flow pipeline work, each IC must be partitioned into functional subunits that are of a size that is practical for place and route tools.
This practical size for physical tools has remained at about 300k-400k instances (or about 600k -- 800k gates) for the last few years. This limit is largely driven by the practical capacity limits of verification tools. Parasitic and transistor extraction tools have to deal with polygons, and in a 400k-instance module, there are usually around 40,000,000 polygons to process. Further, the 400k-block size limit yields manageable runtimes for engineering change order (ECO) scenarios that call for placement adjustment and re-routing.
Because place and route tools have a higher capacity than standard synthesis flows, the synthesis methodology includes a bottom-up approach to assemble smaller block and capacity match design size with backend tools. The most popular block size for synthesis in high-performance designs appears to be in the 100k gate (50k instance) range. This number is driven by runtimes and memory usage required by the most widely used synthesis tool.
Place and route and synthesis tool capacities and runtimes have not made the same strides forward as silicon process technology. This means that there is a rapidly growing complexity management issue, particularly for designers responsible for synthesis. To achieve the same design pipeline latency, the synthesis process needs to be parallelized in a pretty massive way. This also means that there is little time for cross-module timing budget updates and boundary optimization that often lead to further iteration. These factors lead to local minima in optimization that yield sub-optimal overall results. Performance losses in synthesis then have to be made up through manual intervention (gate instantiation, and manual placement and routing).
Among the best-known practices for complex COT design projects is the use of a design flow that capacity matches the front-end and backend tools. Block size for best engineering productivity and overall project throughput has been shown to be when each chiplet (major chip partition) can be synthesized, placed and timed in 12 to 14 hours. Further, fewer partition boundaries have shown to improve the quality if results. That is, operating flat with as big of blocks as possible has been shown to be beneficial from a schedule standpoint as well as for overall design quality of results.
These problems with old synthesis tools were one of the primary motivations for the move to the new generation of synthesis tools. Using new tools, a capacity-matched design flow pipeline could be established where the no added partitioning and re-aggregation steps are required. This dramatically simplified the design flow and, coupled with the faster runtimes and better speed and area, provided a significant boost in productivity.
The goal in establishing the implementation pipeline is that, after the initial trial through the pipeline, the flow is fully automatic, and thus reproducible. Exceptions to this come from design changes that occur throughout the development schedule. The flow needs to accommodate these small changes gracefully.
ECOs, driven by verification team findings or last-minute product requirement changes made by marketing, are a fact of life. Their existence was one of the primary motivators for creating the pipelined design implementation methodology.
The nature and scope of the ECO drives the location within the methodology pipeline at which the intervention will take place. Timing violations can often be fixed with small local changes in placement and routing. When logic restructuring is required, choices are manual netlist editing, or register transfer level (RTL) re-coding. In either case, synthesis may be used for at least resizing operations. If the choice is RTL re-coding, then the rules dictate that a complete chiplet respin will be in order. Designers endeavor to keep manual netlist edits changes to 20 or fewer gates.
To tackle projects serving the router marketplace use of a COT design flow is a competitive imperative. Modern RTL synthesis that enables a flow capacity matched with the new generation of physical design tools can offer a productivity and quality of results boost that gives design teams a winning edge.