Current solutions to the MOPS problem
SoC architects and designers are well aware of the MOPS bottleneck of embedded memory. Unfortunately, today’s embedded memories (built using circuit techniques alone) that offer more MOPS require a large amount of die area and can be extremely impractical. Achieving a 4X MOPS increase for a memory built using circuit techniques alone, for example, typically takes 400% to 800% more physical memory area than a corresponding memory providing 1X MOPS. As a result, architects and designers must use a variety of other techniques to achieve the necessary performance.
A common approach is to break up memory into multiple banks. Each memory bank can be accessed independently, and if two accesses in the same clock cycle go to different banks, then they can both be serviced in parallel to effectively double the MOPS supported by the memory as a whole. What happens when multiple accesses go to the same bank, however? We refer to this as a bank conflict, and when it does occur, memory stalls. Subsequent memory accesses need to be queued up in FIFOs, increasing both the memory latency and, because accesses are no longer guaranteed to be read or written to memory in a fixed time, raising the coherency management of the memory. The combination leads to processor stalls that are propagated as backpressure to earlier stages of the system pipeline. As a result, system performance can no longer be guaranteed.
Multi‐banked solutions are relatively inexpensive to implement in terms of memory area and power. The technique increases the design complexity by adding additional logic required to manage non‐deterministic memory output results, however. Also, the increase in design verification complexity significantly increases SoC development time. In the end, the system performance will still be affected in cases in which bank conflicts occur. An ideal memory solution should 100% guarantee the required MOPS, avoiding non‐deterministic output results.
Rethinking memory performance
It is time to take a fresh perspective on how to increase memory performance. Today, a single-port embedded memory can perform one memory operation per clock cycle. Embedded memory performance has traditionally been closely tied to memory clock speed, and is therefore ultimately limited by it. The question to consider is whether it is possible to increase memory performance without increasing memory clock speeds.
Historically, advances in embedded memories have been limited to maximizing the number of transistors on a chip and cranking up the clock speed. This has been successful up to a point, but as transistors approach atomic dimensions, manufacturers are running into fundamental physical barriers. For this reason, the industry needs to rethink its approach to embedded memory design. As an analogy, increases in processor performance have come not only because of advances in circuitry, but also because of architecture improvements, such as pipelined execution and exploitation of instruction-level parallelism. What if embedded memories could be designed to take advantage of architectural and parallel mechanisms similar to processor architectures to increase memory performance? A new approach called algorithmic memory technology does exactly that.