Design Article
ACE’ing the verification of a cache coherent system using UVM
Peer Mohammed, Romondy Luo, Ray Varghese, Parag Goel, Amit Sharma & Satyapriya Acharya
6/25/2012 10:54 AM EDT
Verification challenges
With coherency support now in the hardware, together with an associated support protocol, the complexity of the system and the underlying components has increased substantially. The verification of such systems thus faces several challenges.
Stimulus requirements
An ACE system can have a variety of masters and slaves connected by a coherent interconnect. Individually, each master and slave component can support a complete ACE, ACE-Lite, AXI4 or AXI3 protocol and can work with a different bus width or clock frequency. The different permutations involve the following parameters: cache states, transaction types, burst lengths and sizes, snoop mechanisms, snooped cache states, snoop responses, support for speculative fetches, support for snoop filtering, and user-specified scheduling of interconnect.
All these cross combinations lead to a very large verification space, creating four key challenges:
The ACE System can have a complex configuration of masters, slaves and interconnect. Some of them can be RTL components with or without a local cache, while others can be verification IP blocks (VIPs) or behavioral models. With such complex configurations, the system-level checks should be able to handle a lot of complexity. For instance, most of the details of an AXI3/AXI4 transaction are presented on the bus. However, for an ACE system, many details of an ACE transaction are not presented on the bus. Some of these are not propagated as it gets routed through the Interconnect. Some of these attributes map to the store/load/update/evict of the local cache line by the ACE master or by interconnect, which initiates a snoop. In such cases, multiple masters can send out coherent transactions, at the same time receiving snoop transactions that may access the same location. Also, the ACE master and slave components support outstanding, interleaving and out-of-order transactions
These variables make it very difficult to predict what the expected results should be. The more complicated checks consist of:
Horizontal and vertical reuse of block-level environments has its own set of unique challenges. In the context of vertical reuse, some of the ACE components can be replaced with actual RTL models. Thus, the testbench or the verification components should provide the infrastructure to be able to factor in the behavior of the RTL components at various protocol phases – for example, during the initiation state of an ACE DUT master, the end state of a snoop transaction, or the cache state transition after a local store/invalidation.
Horizontal reuse or reuse across the projects can be more complicated. The different projects can have different numbers of master and slave components complying with a different subset of the complete protocol. The large number of ACE masters with different levels of protocol compliance complicates the expected result of a coherent transaction. The increasing number of caches leads to more concurrent overlapping accesses to the same location, and more complex snoop scheduling and responses. This makes it more difficult to predict the snoop hit or miss. To support the maximum reuse, configurability of different verification components is vital.
With coherency support now in the hardware, together with an associated support protocol, the complexity of the system and the underlying components has increased substantially. The verification of such systems thus faces several challenges.
Stimulus requirements
An ACE system can have a variety of masters and slaves connected by a coherent interconnect. Individually, each master and slave component can support a complete ACE, ACE-Lite, AXI4 or AXI3 protocol and can work with a different bus width or clock frequency. The different permutations involve the following parameters: cache states, transaction types, burst lengths and sizes, snoop mechanisms, snooped cache states, snoop responses, support for speculative fetches, support for snoop filtering, and user-specified scheduling of interconnect.
All these cross combinations lead to a very large verification space, creating four key challenges:
- Generating stimulus mapped to all of these include, ensuring each individual master, slave or interconnect is fully compatible with the protocol it supports;
- Ensuring all possible combinations of concurrent access among initiating masters, snooped masters and slave main memory are verified and are in compliance with the ACE specification;
- Ensuring all user-specific features are covered and working as expected; and
- Ensuring the completeness of verification, which requires a complete coverage model.
The ACE System can have a complex configuration of masters, slaves and interconnect. Some of them can be RTL components with or without a local cache, while others can be verification IP blocks (VIPs) or behavioral models. With such complex configurations, the system-level checks should be able to handle a lot of complexity. For instance, most of the details of an AXI3/AXI4 transaction are presented on the bus. However, for an ACE system, many details of an ACE transaction are not presented on the bus. Some of these are not propagated as it gets routed through the Interconnect. Some of these attributes map to the store/load/update/evict of the local cache line by the ACE master or by interconnect, which initiates a snoop. In such cases, multiple masters can send out coherent transactions, at the same time receiving snoop transactions that may access the same location. Also, the ACE master and slave components support outstanding, interleaving and out-of-order transactions
These variables make it very difficult to predict what the expected results should be. The more complicated checks consist of:
- The snoop mechanism: The system-level checks should be aware of which master should or should not be snooped. Information such as the cache line state in each master or the source master of a snoop transaction will not be present on the bus, although there may be instances of a user-defined snoop scheduling. For example, the snoop can be done in a broadcast manner, which means all the snooped masters are snooped at almost the same time with the same type of snoop transactions, or in a sequential manner, which means the snooped masters are snooped at different times with different types of snoop transactions.
- The snoop response: Each ‘snooped’ master should respond properly based on the cache line state in its local cache. However, the cache line state of a ‘snooped’ master changes dynamically between the “snoop start phase” and “snoop end phase.” These states can encompass an implicit local store or invalidation sent by it, an explicit writeback/writeclean/evict transaction sent by it, or any unfinished outstanding snoop transactions on its channel sent by other masters.
- The slave memory access: The system-level checker must check for the appropriate slave operation that must happen on the slave bus for an access. A slave access can be generated from such actions as a non-snoop transaction, a speculative fetch or a cache miss fetch. It can also result from a partial line merged to a full line propagating, or a direct coherent write.
- Data integrity check: Different transactions can have different data sources - actual (data in the cache or in the main memory) or expected (local data queues). Even for a single transaction, the data sources can be varied. The system-level checker must monitor the whole ACE system to decide where the expected/actual data should come from. For example, a ReadOnce transaction from an ACE master can have the expected data source on any of these: a local cache copy before the transaction is sent (speculative read), the local cache copy after a coherent response (allocate after read), a snooped master’s cache (cache hit), a slave memory (cache miss) or an ACE bus (does not allocate after read). Data checking becomes all the more challenging because an interconnect can modify a transaction issued from coherent master, so it is not easy to map a slave memory access to a coherent transaction initiated by a master.
- Cache coherency check: The system-level checker must monitor the whole ACE system to decide when and what type of operation might cause a cache line change. Whenever there is a cache line change, it needs to check if there is any loss of coherency. The checks include “cache vs. cache” comparison and “cache vs. slave memory” comparison. The two important cache line checks are: a cache line can be held in a dirty state only in one master’s cache; and a cache line can be held in a unique state only in one master’s cache.
- User-specified features: Such features might impact the prediction of the expected behavior. For example, interconnect scheduling/priority can affect the behavior of snoop and slave accesses. Support for unaligned/cross line access can affect how many times the snoop/slave access will happen. Support for different bus width can affect the burst type and burst length difference between masters and slaves.
Horizontal and vertical reuse of block-level environments has its own set of unique challenges. In the context of vertical reuse, some of the ACE components can be replaced with actual RTL models. Thus, the testbench or the verification components should provide the infrastructure to be able to factor in the behavior of the RTL components at various protocol phases – for example, during the initiation state of an ACE DUT master, the end state of a snoop transaction, or the cache state transition after a local store/invalidation.
Horizontal reuse or reuse across the projects can be more complicated. The different projects can have different numbers of master and slave components complying with a different subset of the complete protocol. The large number of ACE masters with different levels of protocol compliance complicates the expected result of a coherent transaction. The increasing number of caches leads to more concurrent overlapping accesses to the same location, and more complex snoop scheduling and responses. This makes it more difficult to predict the snoop hit or miss. To support the maximum reuse, configurability of different verification components is vital.
Navigate to related information


HIMS
6/27/2012 1:19 AM EDT
Very much Informative.
Sign in to Reply