Verifying that a multi-million gate ASIC will function according to its specification prior to being built into a system composed of hundreds or thousands of additional ASICs plus thousands of other components requires creative, innovative ASIC design functional verification methods.
A super-computer design using a cache-coherent, non-uniform memory access (ccNUMA) memory architecture system is an example of a huge system design containing thousands of ASICs. In such a memory architecture system, thousands of processor cores communicate concurrently with a single shared global address space over the physically-distributed shared main memory. ccNUMA architectures are notoriously difficult to design and verify due to their huge size and complexity.
This paper is an introduction to one method for ASIC verification developed at Silicon Graphics Inc. (SGI). It is impossible to verify that a single ASIC will operate in a system containing thousands of ASICs by simulating thousands of multi-million RTL ASIC designs because HDL simulators cannot hold thousands of RTL ASIC designs in simulation memory, and RTL ASIC simulation speed becomes impossibly slow, even when as few as 8 multi-million gate RTL ASICs are simulated together.
Instead, engineers can verify that an ASIC will function in a huge environment by simulating both the RTL ASIC design, representing a single node in the total system, and a huge ccNUMA behavioral model, representing the remaining thousands of nodes in a full ccNUMA architecture.
After reading this article, you should come away with a basic understanding of some of the infrastructure of a huge simulation model. You should also have a basic understanding of the model’s interface to the RTL design.
SGI's ccNUMA memory architecture
What does SGI’s ccNUMA memory architecture system simulation model look like?
As shown in Figure 1, TSGI’s ccNUMA behavioral system model running in standalone mode (that is, with no RTL design connection) contains:
- All processor cores that make requests to main memory, such as ReadCode, ReadData, and WriteData.
- All cache lines of storage managed by the processor cores and IO devices.
- All home agents which protect pieces of main memory.
- All distributed pieces of main memory.
- All hub logic, which allows the numerous distributed pieces of shared main memory to be seen by all processors as a giant single memory address space.
Figure 1 Internal view of SGI’s ccNUMA behavioral system model
The processor cores, cache lines in cache, home agents, main memory, and hubs shown in Figure 1 are stored in SGI’s ccNUMA C++ behavioral system model using vectors of pointers to the objects. The vectors are stored as data members in a ccNUMASimCore class, as shown in Figure 2.
Figure 2 C++ code showing top-level class for SGI’s ccNUMA behavioral system