[Part 1 introduces the fundamentals of cache, using the C64x DSP as an example. It explains why caches are needed, how caches communicate with main memory, and how to optimize cache performance.]
If multiple devices, such as the CPU and peripherals, access the same cacheable memory region, cache and memory can become incoherent. This is illustrated in Figure 7. Suppose the CPU accesses a memory location, which gets allocated in cache (1). Later, a peripheral writes data to this same location (2). When the CPU requests this data, the memory access hits in cache, and the CPU will read the old data instead of the new data (3). The same problem occurs if the CPU writes to a memory location that is cached, and a peripheral later writes out the data in this location. Since the CPU only updates the cache, the peripheral will write out "old" data. In both of these examples, the cache and the memory are said to be "incoherent."
Figure 7. When the CPU and a peripheral both accesses cacheable memory, main memory and cache can become incoherent.
Typically a cache controller is utilized to implement a cache coherence protocol that keeps cache and memory coherent. To illustrate the coherency protocols, consider the following scenario. First, a peripheral writes data to an input buffer located in L2 SRAM via a DMA transfer. Next, the CPU reads the data, processes it, and writes it back to an output buffer. From there, the data is written to another peripheral. Again, the data is transferred by the DMA controller.
With a DMA write, the peripheral fills the input buffer with the process illustrated in Figure 8.
- The peripheral requests a write access to line 1 in L2 SRAM. (These lines have been mapped from L1D to L2 SRAM.)
- L2 SRAM holds a copy of the L1D tag RAM. The L2 Cache controller checks this local copy to see if the line is cached in L1D. (This is done by checking the valid bit and the 18 bit tag). If the line is not cached in L1D no further action needs to be taken, and the data is written to memory.
- If the line is cached in L1D, the L2 controller sends a SNOOP-INVALIDATE command to L1D. This sets the valid bit of the corresponding line to zero, indicating that the line is invalid. (If the line is not invalidated, the CPU will continue reading the "old" value that was cached in L1D, rather than the "new" data from the peripheral.) Finally, the new data from the peripheral is written to L2 SRAM.
- The next time the CPU accesses this memory location, the access will miss in L1D.
- The line containing the data written by the peripheral is allocated in L1D and read by the CPU.
(Click to enlarge)
Figure 8. DMA Write.
With a DMA read, the output buffer is read out to a peripheral in the process illustrated in Figure 9:
- The CPU writes the result to the output buffer. If the output buffer is allocated in L1D, the only cached copy of the buffer is updated, but not the data in L2 SRAM.
- Next, the peripheral issues a DMA read request to the memory location in L2 SRAM.
- As with the DMA write, the L2 controller checks its local copy of the L1D tags to determine whether the memory location requested is cached in L1D.
- If the line is cached, the L2 controller sends a SNOOP command to L1D. The SNOOP first checks if the corresponding line is dirty. If not, the peripheral is allowed to complete the read access from L2 memory.
- If the dirty bit is set, the SNOOP causes the dirty line to be written back to L2 SRAM.
- Finally, the read access completes by reading the "new" data written by the CPU.
(Click to enlarge)
Figure 9. DMA Read.