The inner workings of an algorithmic memory reflect a microcosm of many already familiar mechanisms such as virtualization, caching, encoding, and so on. Erasure-coding algorithms are used in a wide range of applications to prevent data loss. For instance, Reed-Solomon codes are block-based error-correcting codes that are commonly used in redundant array of inexpensive disks (RAID) mass storage systems. In the case of algorithmic memory, erasure coding is not used to encode redundant data for error recovery. Rather, it is used to encode and decode data that cannot be written to or read from its real address because doing so would result in a bank conflict.
Let’s take a closer look at how erasure coding works in algorithmic memory. Rather than providing two fully independent representations of every piece of data, it is possible to create memory blocks that store a conventional representation of each piece of data and an encoded version of each data item. The conventional representation of a data value can be stored in a consistent location in a main memory bank. The encoded version of the data is implemented in a manner that efficiently combines multiple data items such that only a small amount of additional memory is required to store the encoded versions of data.
When an algorithmic memory receives two simultaneous read operations requesting data that have their conventional representations in the same bank, then it may retrieve the conventional representation of the first data item from the main memory bank and retrieve the second data item by decoding the encoded version of the second data item. To operate properly, the memory system must always be able to fetch the encoded version of the second data item without the use of the main memory bank that is being accessed to retrieve the first data item.
What about memory writes to encode these two stored versions? Similar to reads, if two concurrent write operations are directed to the same memory bank, then they will not be able to simultaneously access that main memory bank. The memory system would stall in order to execute a first write in a first cycle and a second write in a second cycle, introducing latency. To avoid this, an algorithmic memory will write the first new data value into the main memory bank and, in parallel, write the second data value temporarily to a cache or an alternate location. Meanwhile, the system also updates encoded versions of both of the writes.
To drill down a bit further, the encoded version of each data item is stored in an extra memory bank added to the multi-bank memory system. To retrieve a single specific data item from the encoded version, the memory system reads the encoded version and processes that encoded version with a decode function that extracts a single specific requested data item.
For example, a set of data items may be combined together with a logical exclusive-OR (XOR) function and stored in the extra memory bank. This extra memory bank is sometimes referred to as the XOR bank because it stores a logical XOR combination of all the other data items. To reverse the XOR function encoding in order to obtain a desired data item from a particular memory bank, we combine the XOR bank, using an XOR function, with the set of data items—except for the desired data item. The desired data item is automatically excluded because it is the one that cannot be accessed in the current cycle, due to read conflict. This operation eliminates the other data items from the set that were also encoded in the XOR function, thereby leaving only the required data item.
An advantage of this method is that it allows the system to be constructed with simple single-port memory cells. This because there is only a single memory operation performed on any memory bank in a given memory clock cycle.
It seems like the details of the encoding scheme are invisible to the user. Is this correct or are there dependencies the user needs to understand in order to guarantee simultaneous access?
Also- it's been a few years since I worked with Galois fields, but I seem to remember they took many cycles to compute. Are the some that have single cycle(or very low) latency without using too much logic?
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.