Reuse today is largely focused on platform-based design which emphasizes the reuse of large blocks of fixed hardware configurations, pre-engineered for speed or power, with design flexibility effectively available only through software modification. This model is now under pressure from the increasingly apparent need for configurable hardware choices.
As a result of reuse not meeting all of our needs and expectations, significant effort has been applied to improving the design productivity of the traditional synthesis flow methodology. In particular, there are two major unresolved problem categories hindering productivity and performance with the current synthesis-based flow: closure on signal timing requirements in the final design, and on-chip signal integrity issues.
Timing closure is largely a problem that stems from the fact that synthesis is a logical tool and timing in deep submicron (DSM) semiconductor designs is driven by wiring issues as well as gate delays. Thus, it requires a physical understanding of the wiring in the final design. Older synthesis tools used statistical wire load models to estimate the amount of wiring, however this approach was very inaccurate and has caused design automation tool vendors to create tools that are aware of trial circuit placements and of either real or estimated wire routing delays. Some improvement has been noted on the reporting accuracy of these tools.
Signal integrity is also a physical issue that is caused by signal coupling between wires in close proximity to one another, and can only be accurately evaluated after final placement and static timing analysis.
Unfortunately, the process of RTL synthesis provides little, if any control over these problems in the current reuse methodologies, forcing numerous ad-hoc solution iterations through a tool flow.
Advances on the second objective for achieving higher speed and performance or lower power are moving at an even slower pace. Generally each new process generation offers some significant performance improvement along with the typical 2x density improvement. Typically, this performance improvement is in the range of 25 to 50% and is usually measured as the intrinsic delay in the transistor. Improving the performance of a circuit has therefore frequently been related to the ability to increase transistor clock rates, and except for super pipelining, has usually been proportional. Unfortunately wiring interconnects are presenting much larger delays for each new process generation, and if not taken into consideration can significantly reduce the actual performance or clock rate that will be achieved on the chip. Again this requires knowledge and control of final physical placement so the actual loads caused by wiring capacitance can be considered with gate loads when crafting for performance.
There are three ways to lower the power in a design. We can optimize the power by lowering the clock rate since power is a direct function of clock rate. Another option is to run the circuits at a lower voltage. This offers a much lower power design since power is proportional to voltage squared. Finally, intimate circuit knowledge enables clock gating techniques to deliver significant ways of reducing power by actually turning circuits off when they are not needed. Typically this is a step that cannot be fully automated.
In the 60's and 70's wiring delays across circuit boards were a major performance issue, and reduction of crosstalk and reflections were a major part of the design effort. The 80's brought high density CMOS integration with insignificant crosstalk and wiring delays, and RTL synthesis began delivering significant productivity improvements.
In the 90's transistor density increased, resulting in longer wire delays with some crosstalk, but gate delays were still the dominant factor and synthesis still produced good results.
However, from 2000 and into the foreseeable future, we will be dealing with deep submicron issues. Making a circle back to 1960, wiring delays and crosstalk will be at least as critical as gate delays, and must be managed.
A silver bullet?
In the past, our industry has been trying to solve the productivity versus optimization objectives with methodologies focusing exclusively on design reuse or simply with increased tool investments. In fact, a new methodology combining both reuse and tools is required. Reuse productivity has proven to only be effective for a few platform based applications, and while "physically-aware" synthesis tools have somewhat helped with timing closure and performance problems, it is still not a viable alternative to the results of a custom crafted design. There is no silver bullet because the problems are multifaceted and require a new approach that combines multiple solutions.
Telairity Semiconductor's engineers have developed a new approach that combines the productivity advantages of reuse with a new optimized design methodology that builds on current physical design flows. This method involves new tools that provide initial floor planning with power, performance and area estimations.
Dramatically improved design productivity is achieved with the reuse of hardened basic building blocks called "groups" that are larger than a typical standard cell library, yet smaller than the 50k gate soft IP cores attempted earlier. These reusable groups are all custom designed to perform a specific high speed or low power function, each containing approximately 1,000 gates, a size that has a very high reuse capability. These hardened groups are pre-engineered and pre-verified through the first three layers of metal so the extremely time consuming issues surrounding timing closure and signal integrity are eliminated. The groups are typically within 10% of the performance of a full custom design, thereby delivering much higher performance with the improved productivity.
Designs have been done that average 50k gates per square mm at speeds of 400 MHz in 0.18µm CMOS. This approach also implements a clock infrastructure that approaches zero skew, enables simplified group interconnections with global wiring in metal layers four and five, while a robust power and ground structure is routed in layer six.
This approach is well suited not only for ASIC and ASSP designs but is extremely well suited to configurable application platforms by providing extremely high performance, very quick turnaround and very low costs. Comprehensive, high speed application platforms can be manufactured and stored in wafer form at an early metal layer while the configurability, which customizes the platform for each customer, is simply done by routing a custom interconnect on one or more of the remaining layers.
The advent of deep submicron design is fairly recent, and holds great promise if the semiconductor industry can close the current SoC design productivity gap. IP reuse is a key strategy, but we can't band-aid outdated design approaches to fix the inherent problems that have emerged. The solution must directly address the specific issues and, in this case, that means taking a step back and taking design innovation down a different path. Today's designs must be denser, faster and ready in record time.