Hello there. This is my second column here on EE Times. This time, I am going to describe something more technical than I did in my introductory post. I hope it will prove useful to you. In this column, I will explain the path from a normal FPGA Block RAM into a 2W+4R (two read ports and four write ports) memory, which I needed for my upcoming CPU design.
Most FPGAs include dual-port block RAMs, which are very convenient for a wide variety of designs but have some inherent limitations. One limitation relates to the write sequence. Writing to one port and reading from the other at the same time sometimes does not yield the result you'd expect. These block RAMs also usually have two methods of operation when writing data. They either return the newly written value on the corresponding read port (write-first), or they return the old data that was present before the write (read-first). This post will be about using the block RAM in write-first mode (and using a single clock for both ports). I'm going to assume that you are familiar with block RAM basics, so I'll skip that part.
With modern RISC-like CPUs, which can be superscalar (meaning they can execute more than one instruction on the same clock cycle) in some scenarios, the register bank must present a significant number of read ports, so we can read many registers at once. The register bank also needs to provide a significant number of write ports to allow register writeback on multiple registers at the same time. The block RAMs present on the FPGA support only two ports, so a technique must be devised to allow for more than one reader and more than one writer at same time.
I should note that there are some alternative techniques to the one presented here. I will mention some alternatives during the course of these discussions, but exploring these in depth is left as an exercise for the reader.
Let's start by considering a block RAM with one read-only port and one write-only port. Most block RAMs have a simple but annoying limitation -- there is a data conflict if you try to write a value on one port on a specific address and at the same time read the same address from the other port. For example, Xilinx's Spartan-6 FPGA block RAM user guide states:
When one port performs a write operation, the write operation succeeds… If the write port is in either WRITE_FIRST or in NO_CHANGE mode, then the DATA_OUT on the [other port] read port would become invalid (unreliable). The mode setting of the read-port does not affect this operation.
It probably won't surprise you that we don't want unreliable data on the read port. This would render our design useless. However, the user guide tells us about a small trick (very well known by CPU designers) can be used to overcome this limitation.
Conflicting simultaneous writes to the same location never cause any physical damage but can result in data uncertainty… When one port performs a write operation, the other port must not write into the same location, unless both ports write identical data.
The trick is very simple. Since we are reading from one port and writing to the other, we just convert the read on the read port into a write if there is a concurrent write to the same address on the other port. This will cause the read port (now converted into a write) to output the new data (due to the WRITE_FIRST mode on itself). There will be no write conflict, since both writes on both ports target the same address and write the same data.
Here's a simple schematic of the design. The additional connections required are shown in red.
Conflict-free block RAM (click here for a larger version).
Now we have a 1R+1W (one read port, one write port) design that does not have any conflict issue when we read and write to the same address. This is still short of our objective, which is to have four read ports and two write ports, so let's improve on this design.
A 2R+1W block RAM
Let's start improving our design by adding another read port. We have the limitations associated with the block RAM hardware, so how are we supposed to add another read port if we cannot change the silicon?
The answer is replication. As its name suggests, this involves replicating the memory contents of the block RAM, effectively using two times the resources. Below we see a simple schematic representation of this.
2R+1W block RAM (click here for a larger version).
How does this help? The answer is simple -- we connect both write ports so that writes target both individual 1R+1W block RAMs but use each of the read ports separately. This way, every write will effectively be present on both memories and available to both read ports.
A 4R+1W block RAM
Once again, replication comes to our aid. Our new design will use four times the original resources to provide us with four read ports, as illustrated in the schematic below.
4R+1W block RAM (click here for a larger version).
Basically, we just replicate the RAM again like we did in our previous example. Using this technique, we can add as many read ports as we like (at the expense of RAM and an increase of fanout on the write path).
Of course, we still have only one write port. Adding another write port will be a little more complex. I'll address this in my next column. Also, I will be considering some of the other options available to us to implement these types of multi-port RAMs. In the meantime, do you have any questions or comments?