In order to avoid verification inefficiency, we must understand each stage of Figure 1. We do not need to know how architecture will be implemented to verify it. We also do not need to concern ourselves with every clock to verify behavior, and the verification of the RTL implementation does not need detailed timing. Successful verification separates these issues and relies on other tools to ensure validity of assumptions and simplifications made at each stage.
Moving up in abstraction
There are two levels of abstraction to which the emerging SystemC and SystemVerilog tools are directed. The first are the top-down system level tools that analyze algorithms and architectures and then optimize their partnership. These tools target the system design and partitioning stages shown in Figure 1.
Many tools recently emerged that enable fast simulation of preconfigured platforms, such as PrimeXsys4 and OMAP5, and can handle user created blocks. These systems, centered on a processor model, execute at multiple MIPS on a typical Pentium-based platform6.
Based on these models, large quantities of functional verification can be performed with fairly accurate information on architecture or partitioning choices. However, we cannot show that these models are correctly implemented at the RTL level. This is why most of these systems are based on available IP blocks.
The second path to abstraction is the continuation of a bottom-up progression to a more abstract design description, where the micro-architectural details are inserted by a synthesis tool, rather than defined in the description. This is the behavioral level shown in Figure 1. Micro-architectural details show how an algorithm is implemented. For example, there are many ways to implement a multiply operation, ranging from a single cycle to more iterative “shift and add” sets of operations requiring multiple clocks.
Pipelines permitting data to stream through multiple operators can be built, and resources can be shared and scheduled to avoid conflicts. Tools exist that take RTL designs and produce simulation models that do not maintain the implementation details, but can improve speed by 5 times or more7. Experiments conducted by people in the industry have shown full transaction-level models run 100 to 1,000 times faster than the equivalent RTL models.
Key elements of a solution
With design sizes in the tens of millions of gates, the number of test vectors run represents a small fraction of possible design behavior. Regression suites also range from a few hundred to many thousands of vector sets with execution times ranging from days to weeks, even on simulation farms. If these need to be run after every change, it becomes too expensive to consider changes late in the development cycle.
There are three key pieces of technology to unite if this verification crisis is to be resolved:
1. High-level simulation that ignores the implementation detail and can be used for effective simulation of the required functionality and performance evaluation.
2. A means of ensuring that an implementation functionally matches the high level models.
3. Effective means of defining and checking the temporal properties of a design against the specification.
All three of these technologies are currently emerging, and much of the high-level simulation technology is already in place, though not widely used. Accellera defined the assertion languages, and tool support is coming on line. Sequential equivalence checking is the industry’s newest technology. Sequential differences are illustrated in Figure 2 below.
Figure 2 Sequential differences
At the top of Figure 2, the system behavior is defined, but not the clocking. In other words, it does not say @ (posedge CLK), for which the solution on the left of Figure 2 would be feasible. With extra flexibility, the right-hand solution is equally valid, and may allow for the use of a faster overall clock or less silicon area.
However, proving that these two are functionally equivalent is more difficult. Here, without knowing the starting state of the control flip-flop, it is difficult to know when the operation is meant to start or the output is valid. These differences must be resolved if the solutions are to be declared functionally equivalent.
It is easy to locate problems in simple examples, but if you change to a complex processor pipeline and use simulation to find all possible behaviors, it will be time-consuming and error prone. For complete verification, 2 to the power (number of input bits + number of state bits) vectors are possible. While not all vectors are needed, the number to ensure a good probability of functional equivalence is high, which explains why most engineers are unwilling to consider significant changes to a micro-architecture once they find one that works.