Verification remains a key issue in system-on-chip development. The time taken to verify a high-density SoC design to a high level of confidence can lead teams to think the unthinkable. One of these counterintuitive options is to not exhaustively verify a chip before taping out but use the resulting silicon itself as a cornerstone of the verification process.
A panel session at the recent 51st Design Automation Conference was more or less evenly split on the approach. Early tapeout has its attractions but carries risks and potential costs that go way beyond the price of a mask set for a device that is highly unlikely to make it to production.
Early tapeout has one clear advantage. The fastest platform for running tests is the silicon itself. Even the best emulator or FPGA can only operate at a fraction of the speed of the final target, assuming that the SoC can be mapped. By moving to silicon quickly, the verification and software development teams gain access to a platform that will allow them to run many more test vectors and, potentially, finished code that will tease out bugs that may lie hidden within an enormous state space. It also allows for real use cases to be executed.
At this and other verification-focused sessions at DAC, designers and verification specialists pointed out the problems of verifying complex multicore SoCs. Very often the communications infrastructure between the individual cores can contain serious bugs that, if they do not render the chip useless, may still result in severe performance penalties. These bugs are extremely difficult to tease out, and they require large amounts of execution time even to observe them. By running close to final silicon speed, a verification team can hit potential bugs much more quickly. However, just hitting the bug is not sufficient. The silicon has to be debugged, and the lack of visibility of internal signals can make debug very hard.
Also, the advantage of speed has to be balanced against the clear risks of an early tapeout strategy. The greatest risk is that the limited amount of verification performed before tapeout does not identify a bug that leads to the silicon being dead on arrival or so badly compromised that large portions of the device have to be left off limits. A bug that disables one small device could be tolerated. But if it interferes with the cache-coherency protocol that links the major CPUs, the team will have waited close to three months for a device that will only yield partial insights while they wait for the next re-spin. Those additional months, for any high-volume product, will be far more costly than the mask set itself.
Some early tapeout designs deliberately include the ability to disable some mechanisms (such as caching) just in case there is a killer bug. Although this reduces the value of the early tapeout, it often enables the chip to still be used as a verification or software development platform while the verification engineers return to more traditional techniques to debug the killer bug. Panelists concluded there is a place for early tapeout, but it is far from being a substitute for rigorous verification. In fact, teams that have a highly structured approach to verification that prioritize the testing of vital functionality, and only move to the post-silicon phase to find less obvious issues, are most likely to achieve success.
— Dr. Mike Bartley, founder and CEO of TVS, has a PhD in mathematics from Bristol University, an MSc in software engineering and an MBA from the Open University, and more than 20 years of experience in software testing and hardware verification. He has built and managed state-of-the-art test and verification teams in a number of companies (including STMicroelectronics, Infineon, and Elixent/Panasonic) which still use the methodologies he established. Since founding TVS he has consulted on multiple verification projects for respected organizations and has had more than 20 articles published on the subjects of verification and outsourcing. Contact Dr. Bartley at email@example.com.