The unspoken domino theory of chip design is starting to break down in a big way with many current SoC projects. Designers cannot simply conquer one block, move onto the next and hope that the overall project will be a success. A strategy for conquering the world otherwise known as a global strategy must be developed and used as the primary guiding principle for design.
Design methodologies based on restricting optimization at the block-level cannot maximize the performance of the design at a global level. These methodology practices came into being because of practical limitations of some of the older EDA tools. Many unnecessary engineering steps and costs go virtually unnoticed because there have not been innovations in synthesis prior to the emergence of some recently announced new tools. The legacy of "we have always done it this way" or "with these tools you have to do it this way" impedes the vision of how much better things could be by using a new generation of synthesis technology.
Because of limitations in the old tools, designers have to artificially break their designs into many more pieces than necessary. And, there are tool-driven reasons that they break their design into blocks smaller than will be handed off to the back-end tools:
Control performance. The old technology requires many knobs and switches to be set correctly to get a reasonable quality of results on critical paths. To control the scope of switch settings, designs have to be broken into many more pieces.
Reasonable runtime. Old synthesis tools have non-linear runtime behavior with design and library size. To get jobs that do not run longer than overnight, smaller blocks are needed.
Capacity limitations. The memory footprint of old synthesis tools is large and, to prevent running out of memory, smaller jobs need to be configured.
The by-product of methodologies born out of these tool limitations is complexity for the SoC designer /IP integrator. As an example, a commercial 64-bit MIPS-based core is approximately 400K gates in some configurations. Its developers' experimentation found that the best achievable results from the old tools were obtained by breaking the block up into 38 subblocks.
Each one of the subblocks had three scripts associated with it a script for setting constraints, a script for the compilation strategy, and a script for further hierarchy manipulation. These 114 scripts were also augmented with a dozen or so other scripts that managed the sub-block integration. For the SoC designer looking forward to the challenges of 100M gate design, the complexity management issue will become overwhelming: Imagine, 30,000 scripts would be required for a 100M gate design built from 400K-gate building blocks using present practices.
Further, each of those scripts would need to be tuned for the global design goals. A lowest common performance factor, driven by local optimization methodology found late in the design cycle, is costly in terms of engineering effort and project schedule.
The formulation of a global optimization strategy is not a "normal" activity today. Complex, high-performance SoC design will involve the incorporation of disparate types of blocks from disparate sources, more likely than not. While it is clear that goals for memory design are to lower cost (die area) per bit, increase performance, and lower power, within the context of an SoC design, global design goals may lead to conflicting requirements. For example, a pipeline stage may have to be added or repositioned in the address decoding logic to accommodate global design goals.
The correct balance
Similarly, for RF or analog IP blocks, whose incorporation is now a competitive imperative, shielding, scaling, aspect ratios, signal integrity and noise margin elements are primary concerns that may have to be balanced versus global design requirements. For example, block location may have to be altered to accommodate trade-offs between load isolation and signal arrival times.
Test strategies, often an afterthought in SoC architecture planning, can play a critical role in the performance, area, and cost of a device. While some IP blocks may be best tested with parallel test data, the cost of extra pins or muxing at chip I/Os may be too great to bear. With hard IP, for example, there is not much that can be done. Whatever design for test (DFT) structures the provider has included are pretty much what the design team will have to deal with in the global optimization process. That is, unless the team has the luxury to wait for a new version of the IP to be created and qualified.
Thus, a global optimization strategy must have a global view not only of the design (all of its constituent elements), but the design process. Design, verification, implementation, optimization and test must be viewed alongside the master plan for the entire device.
While this sounds trite, this global view is painfully absent in many projects. Far from a master plan, the mode of operation is often to react to problems as they arise. The problems with this reactionary approach is a local minima, rather than the global optima, will always be reached, and severe problems may sneak up and whack a design team.
Just taking one aspect of a master design strategy, performance, can be a complex orchestration itself. At the top-level (chip boundary), system timing requirements dictate input signal arrival times, output signal required times, and often internal clock rates (though not necessarily). When a chip micro-architecture is being developed, not all of the actual performance details of implementation are available to the design team. This may lead to "close calls" being made.
When these close calls interact, disaster usually follows. Disasters can be relative. Early in the design cycle, an architectural flaw can be fixed with relatively little impact. If a block-based bottom-up strategy is being employed and the flaw is not discovered until late in the design process, market windows can be missed and projects abandoned.
All of this can happen with "perfect" execution. It is just a fundamental flaw in the process. Another source of problems arises from imperfections created by the complexity of the problem. The more blocks, time budgets, constraint scripts, and timing reports that must be created, analyzed, debugged and managed, the more likely it is that consistency or other errors will be introduced.
The way to get around these problems is to employ a global approach. This means that the new generation of tools, that have practical usage in the multi-million-gate range, have to be employed. It means that a capacity-matched chip partitioning scheme can be used to minimize block boundaries, and identical constraint sets can be used up and down the design flow. These practices minimize complexity and the chances for complexity related errors.
A recent consumer electronics design project of a 3M-gate SoC services as a good example. It incorporated memory, a 64-bit MIPs processor, an interface IP, and custom logic. A switch from a block-based to a global-based design strategy enabled a quantum step in performance that had great significance in end product value. The design team had taken the domino approach to the project and were stuck in a local minima that, post-layout, meant that it would be stuck for use only in 300MHz sockets.
By adopting new tools that enabled the global optimization strategy, it was able to create a design that would plug into the high ASP 330MHz sockets. It was the same design, but the design team adopted a global strategy.