Design Article
Chip synthesis: A new approach to RTL implementation
Paul van Besouw, president and CEO, <a href="www.oasys-ds.com">Oasys Design Systems</a>
2/16/2010 5:23 AM EST
Traditional synthesis is coming apart at the seams, especially for designs larger than
Synthesis: a little bit of history
From early days, all synthesis tools have been built basically the same way: turn the RTL code into gates using fairly naïve algorithms, and then optimize the gates to meet the constraints. It's as if C language compilers all worked by turning the C straight into machine instructions, and then optimizing the machine instructions. In principle, with enough runtime and enough clever optimization techniques, working at the machine instruction level might discover a higher-level optimization such as pulling a constant sub-expression out of a loop. However, it is much better just to do higher-level optimizations at the higher level in the first place. Modern C compilers are indeed built this way, with global optimizers that look at a high-level representation of the program, and a straightforward peephole optimizer cleaning up final details at the machine instruction level at the very end.
The first logic synthesis tools in the late 1980s simply optimized gate-level netlists derived from schematics. RTL synthesis was added on top of that foundation of logic optimization. The RTL code was read in and reduced to a control/dataflow data-structure, which was then turned into gates. Finally, the gate-level optimizer would grind away until the design met its timing constraints. Since the impact of wires on timing was almost entirely capacitive (resistance was not yet an issuetiming analyzers didn't even take it into account), simple wire-load models were used and the gates were not physically placed until the next step, place and route.
When physical information became more important, placement was merged into the gate-level optimization step so that instead of using wire load models, an estimated route including resistance could be calculated. For the last twenty years, synthesis has been built around a core of gate-level optimization.
There are two big disadvantages of this approach. Firstly, gate-level optimization is a low-level optimization, and secondly gate-level optimization requires an enormous amount of data to be simultaneously accessible in memory. This means that run times are too long and capacity is too low.
As a result, with traditional synthesis designs need to be split up into smaller blocks to address tool capacity limitations. And it keeps getting worse: in 1990 traditional synthesis capacity was about 10K gates and a chip was about 100K gates, meaning the design would need to be split into 10 blocks. In 2009, traditional synthesis capacity is up to about 500K gates but chips are 100M gates, meaning 200 blocks. This makes for a horrible problem of time-budgeting to control the synthesis. Then place and route has to take those 200 blocks and assemble them together and meet the overall global timing constraints. This cycle simply does not close without an unacceptably large number of iterations that can take months. A new approach is required.
The chip synthesis solution
Chip synthesis works very differently. Once the RTL code has been parsed, it is partitionedbased on connectivityinto smaller partitions that will eventually be reduced to gates. Each partition is small enough that it won't contain any long wires, which would lead to high variability in timing, but large enough to have implementations with potentially different area-time tradeoffs. Each partition is largely independent of the others. Of course, the timing numbers from all the other partitions are required to be able to time the whole chip, but the detailed internals of every partition are not required simultaneously. Because it is no longer necessary to look at the whole chip at the gate-level at the same time, the memory requirements are hugely reduced.
This RTL partitioning approach is the main reason that chip synthesis can be so fast and so effective. By operating at a higher level, it intelligently synthesizes and times the design one partition at a time. Then, until timing is met, it re-synthesizes, re-places (and updates the global routes) and perhaps re-partitions parts of the design until the constraints are met.



