SANTA CLARA, Calif. — According to some estimates, 90 percent of the world's data has been produced in the past two years. Users talk airily about the cloud, but the reality is that all of those bits and bytes have to reside somewhere, typically in datacenters where they must be managed, monitored, and accessed.
That makes for some startling market numbers. The data communications infrastructure market is estimated by Gartner at $10 billion, and its compound annual bandwidth growth rate is expected to be 40 percent. Of that, the SoC memory market, including routers, Ethernet devices, storage-area networking infrastructure, mobile infrastructure, etc., is on the order of $250 million or more.
As the era of terabit computing looms ever closer, the challenges only compound. We talked with Sundar Iyer, CEO and co-founder of Memoir Systems, and Shadab Nazar, Memoir's director of product management, about power, performance, the search for smarter memory, and the battle for speed.
Kristin Lewotsky: What is involved in the jump to terabit computing?
Sundar Iyer: There are three things that make terabit computing really hard, and memory plays a critical role in all of them. The first is network bandwidth, which is going up by 30 percent every year. But, when we look at SoCs, we're seeing a 700-800 percent increase in aggregated demand within a generation -- chips going from handling aggregated bandwdiths from 480 Gbit/s to 3.22 Tbit/s. That is way more than the incremental jumps we've been doing in memory performance. Meanwhile, memory today has really begun to dominate SoC area. We're talking about 200-800 Mb of memory on-chip, which occupies 50-70 percent of the total area. Last but not least, we have the age-old power problem. We're now looking at chips that are burning over 80 W on average.
The brute force approach, just using flops, is a nonstarter from a power-budget perspective. A flop can burn almost 10 times as much power as the memory itself. Going to a memory design house for custom memory is another possibility, but it will take them two years to build it, and the area and power for that will be 400-500 percent larger than you can afford.
Embedded designers working on terabit Datacom SoCs face the challenge of delivering sufficient memory performance while minimizing consumption of chip real estate and power.
Kristin Lewotsky:Why are current memory solutions inadequate?
Sundar Iyer:The brute force approach, just using flops, is a nonstarter from a power-budget perspective. A flop can burn almost 10 times as much power as the memory itself. Going to a memory design house for custom memory is another possibility, but it takes years to build, test, and qualify it, and the area and power for that will be 400-500 percent larger than you can afford.
Kristin Lewotsky: What about nondeterministic memory?
Sundar Iyer: If you wanted six memory accesses per clock cycle, perhaps you could just keep six different memory macros and hope that the next six accesses go perfectly to six different memory banks. The problem is that the moment two simultaneous memory accesses go to the same macro, you've got blocking. The whole spray-and-pray approach does not give guaranteed performance. Worse, it cannot give you low latency. When you look at new applications in datacenters and clouds, such as Google searches and Facebook logins, each of these has extremely stringent data latency requirements: sub-50 or even sub-20 µs.
Kristin Lewotsky: You specialize in alogarithmic memory, which adds a layer of logic on top of conventional memory.
Sundar Iyer: Looking at area, power, and performance requirements, even algorithmic memory solutions need to be boosted. It's very clear we need a different technique.
Kristin Lewotsky: Which is why you developed pattern-aware memory. Can you talk more about how it works?
Shadab Nazar: Many applications exercise specific access patterns on embedded memories. They do not need random access behavior from the memory subsystem, or they may only need random access behavior on a subset of their memory ports. You have a FIFO application or function for which port accesses are always sequential. You have packet buffers that need to read data in a random fashion from any location. However, when it comes to write, they don't need any specific address. All they need to do is be able to write the packet to a particular location and, when the time comes, read the packet from that location.
You have policing, netflow, and state management applications. These applications do read-modify-write operations, for which the application reads the value written in the memory, adds a number of bits, and writes it back. Basically, the read is random access, while the write is not. Finally, you have counter memory, which is a special case of a read-modify-write operation, where the write-back is an addition to memory. We came to the conclusion that, if you design memories specifically for these functions or applications, we would be able to both scale the memory performance and optimize the product.
Datacom SoCs typically access memory in one of four ways: statistics counters (upper left), multiport FIFOs (upper right), packet buffers (lower left), or policing/net flow/state update (lower right).
Kristin Lewotsky: Can you give us examples of some specific features and benefits that pattern-aware-memory technology can provide?
Shadab Nazar:In 2010, a Stanford analysis showed that 2 percent of the overall energy produced in the United States goes to running datacenters. We made two key observations, which pattern-aware memory is able to take advantage of to reduce energy consumption.
First, networks are lightly utilized and not congested. This means that, if the packets can be intelligently allocated in memories based on the incoming pattern, then less than 30 percent of the memory macros need to be consumed when networks are not congested. This means that 70 percent of the remaining memory macros can be put into a very low-power hibernation state, reducing energy. The other observation we made is that, even if all the memory macros store packets during peak utilization, only a portion of them -- say, 20 percent -- are activated every cycle to write and read from memory. This meant that some 80 percent of the memory macros, based on the incoming pattern, can be put into a low-power sleep state, where they could be woken up on demand within cycles.
Kristin Lewotsky: How does this translate into numbers?
Shadab Nazar: We estimate that using these techniques on datacom SOCs would lead to saving 5 percent of the energy consumed by the networking gear in a modern datacenter. In an example of a 50,000-server datacenter, the savings would amount to $2.5 million over the course of five years.