Pause and consider for a moment a world in which passing tests are not an indicator that all is well, but rather a warning flag that not enough functionality is being tested, vital checks that confirm proper operation of the design are missing, or problems in the verification infrastructure are masking test failures and hiding RTL bugs merrily making their way to fabrication.
Introducing the Panacea
Veteran readers will quickly assume that this is the section in the article at which a faint drum-roll is heard, horns are sounded, and the author, with due seriousness and a hint of irony, presents his company's product as the cure for all previously-described ailments, world hunger, and strife and discord among mankind. Veteran readers will not be disappointed.
The concept behind Functional Qualification is relatively simple: Take an RTL design that passes all simulation-based tests, "break it" in some interesting way, and see if it still passes all of the tests. If the broken design causes at least one of the tests to fail, then that's a good thing, as it means the verification environment is robust enough to detect that this particular broken version of the design is, well, broken. If, on the other hand, all of the tests pass, then that's a really bad sign. It means that there are at least two versions of the design each with a different RTL description and divergent functionality compliant with your test suite. What does that mean? Well, it could mean that the broken design is actually correct, and through some mystical force of randomness you've accidentally fixed a problem in what you thought was the good design. More likely, it means that you're missing a test scenario that would cause the broken design to exhibit its broken-ness, or you've forgotten a checker that would have detected the bad operation with the current set of tests.
Another way to think about Functional Qualification is purely from the "effect" side. The idea is to have an automated way a tool that tries to make something bad happen at the points where you are monitoring design activity. Imagine the extreme case of tying all of your design outputs to 0 and running your regression tests. That's definitely bad, and you'd expect many, if not most, of your tests to fail, right? [I'll pause for a moment to wait for those of you who have just gone off to try it.] Now consider the (perhaps) more interesting case of changes from the norm on individual or related outputs the somewhat more subtle deviations that real RTL bugs cause on a daily basis.
The beauty of Functional Qualification is that it uses the design itself the readily-available, supposedly "good" RTL code to perturb the functional operation of the design and stress the verification environment. There is no need to write a "verification environment for the verification environment" would we do that in Ada? Esperanto? I'm not sure. In any case, the result is both a measurement of the overall health of the environment and identification of specific holes and weaknesses missing checkers, missing test scenarios, problems in the test infrastructure that must be fixed to ensure that real RTL bugs don't inadvertently slip through the process undetected.
No doubt the skeptical amongst you are scoffing at the idea of such a process. After all, aren't there hundreds or thousands of tests in the typical environment and an extremely large number of ways in which to break a given RTL description? Affirmative on both points, but where there is a will there is a way. Research and real experience in production environments shows that some ways of breaking the design of injecting "faults", in the application parlance are more interesting than others and more likely to expose the biggest weaknesses in the verification environment with a minimal amount of simulation. Why not start with these faults first, identify and fix the big problems, and then move on to deeper and more subtle issues? Isn't it likely that some of these more difficult problems will be corrected as a side effect of fixing the big problems anyway?
One might also ask do we really need to run all of the tests on a given version of a broken design? If a test doesn't activate the area of RTL code that is broken, then there is no use in running it, so a little automated up-front investigation by the tool might correlate injected faults with the tests that activate them and only run the latter when the design is broken via a given fault. Additionally, if the environment incorporates what most would consider to be a good methodology in which the monitors and checkers are decoupled from the tests themselves, if a small number of tests on a given fault causes a significant difference on an output yet doesn't trigger a failure, shouldn't that be interpreted as a weakness in the environment, even without the justification of running more tests?
In short, though the task at hand seems daunting given the complexity and size of the design and associated verification environment, it's clear that automation, specialized algorithms, and intelligent operation can make the process both efficient and practical.
Just the facts, Ma'am
Springsoft has managed to roll up all of this goodness in the Certitude Functional Qualification System, the industry's first and only practical application of what are known as "mutation-based techniques" the process of injecting faults in the RTL code and all that blather (see above) to the assessment and improvement of real-world simulation-based verification environments. Certitude uses a multi-step process to accomplish the following with a minimal amount of haranguing and gnashing-of-teeth:
. Compile the RTL code, determine all of the interesting ways in which it can be broken, and write out an instrumented version of the RTL that enables individual fault insertion at simulation run-time from a single compiled image.
. Sort through the faults and identify, report, and set aside those that need not be simulated, such as faults that, due to redundant or dead code, cannot impact the operation of the design at the outputs.
. Correlate the faults with the tests that activate them and keep track of other interesting information such as test length all of which can be used to optimize and speed the qualification process when faults are injected.
. Prioritize faults and identify the subset that is most likely to expose big weaknesses in the environment.
. Inject faults, one at a time and in priority order, and simulate the relevant tests to check the robustness of the environment and identify specific holes such as missing checkers and test scenarios that must be fixed to ensure solid, high-quality verification results.
. Provide concise reports on problem areas and push-button links to the Verdi Automated Debug System for quick analysis of results and resolution of associated holes and weaknesses in the environment.