Design Article
Integrating large-capacity memory in advanced-node SoCs
Prasad Saggurti, Synopsys
1/14/2013 1:57 PM EST
In today’s system-on-chip (SoC) designs, memory content can consume over 50% of chip area. In addition, the size of individual memories has grown to approximately 40 Mb of contiguous memory. This combined increase can significantly impact the overall power, performance, and area of the chip, as well as manufacturing yield. Successful integration of large-capacity memory in advanced-node SoCs requires the right approach.
Recent advances have made silicon interposer technology an alternative to consider (see figure 1). A silicon interposer consists of additional layers or even a separate chip designed to connect a processor with a separate block of memory. Using this technology in conjunction with through-silicon vias (TSVs), a processor can pass data to and from the external memory without paying the high price of additional pins. Interposer structure consists of multiple layers, however, which adds steps, time, and cost to the fabrication process. Despite the performance boost, the approach is still too expensive to be considered mainstream packaging technology.

An earlier approach to addressing the increase in memory content was the use of embedded DRAMs (eDRAMs). When the area of the eDRAM macro was one third to a quarter of the size of the equivalent embedded SRAM, eDRAM made economic sense, despite the extra processing costs. This solution did not scale well with advancing technology processes, however, resulting in SRAMs reducing in area much faster than eDRAMs, and thus reaching per-bit cost parity in 40 nm, even at 20-Mbit sizes. Furthermore, the prolonged development timelines of eDRAMs compared to the logic process made them even less attractive.
A good alternative solution is to use large-capacity SRAMs, which can dramatically reduce leakage and deliver higher system performance while keeping mask costs in check (see figure 2). The challenges of designing large memory blocks with embedded SRAM must be understood and best-in-class strategies must be employed to ensure chip-level power, performance, and area targets are met.
Considerations when designing large memory blocks
The default approach to implementing a large-capacity SRAM is to build it out of smaller blocks of memory. It is important to note, however, that coupling together smaller blocks to build a large-capacity memory may negatively impact both area and performance if not handled correctly. There are a lot of nets that need to run between and through the building blocks. In order to keep the area under control without using too many layers of metal, and to reduce the negative impact on implementation speed, it is important to keep the inter-block routing uncomplicated. Inefficient wiring also tends to slow down the performance of the memory, resulting in long access times and slow cycle times.
Dynamic and leakage power are particularly important concerns in large-capacity SRAM implementations. The total power dissipated by the combined memory must be less than the sum of the power dissipated by individual blocks of memory, or there may be no benefit over the basic implementation.
As we can see, there are implementation challenges related to timing, power, and area. To make the combined memory implementation worth doing, it must provide benefits in at least two of the three criteria and be on par in the third compared to the baseline implementation.
Next: Solutions
Recent advances have made silicon interposer technology an alternative to consider (see figure 1). A silicon interposer consists of additional layers or even a separate chip designed to connect a processor with a separate block of memory. Using this technology in conjunction with through-silicon vias (TSVs), a processor can pass data to and from the external memory without paying the high price of additional pins. Interposer structure consists of multiple layers, however, which adds steps, time, and cost to the fabrication process. Despite the performance boost, the approach is still too expensive to be considered mainstream packaging technology.

Click image to enlarge.
Figure 1: A silicon interposer is a structure that provides direct access to external memory, but is costly.
An earlier approach to addressing the increase in memory content was the use of embedded DRAMs (eDRAMs). When the area of the eDRAM macro was one third to a quarter of the size of the equivalent embedded SRAM, eDRAM made economic sense, despite the extra processing costs. This solution did not scale well with advancing technology processes, however, resulting in SRAMs reducing in area much faster than eDRAMs, and thus reaching per-bit cost parity in 40 nm, even at 20-Mbit sizes. Furthermore, the prolonged development timelines of eDRAMs compared to the logic process made them even less attractive.
A good alternative solution is to use large-capacity SRAMs, which can dramatically reduce leakage and deliver higher system performance while keeping mask costs in check (see figure 2). The challenges of designing large memory blocks with embedded SRAM must be understood and best-in-class strategies must be employed to ensure chip-level power, performance, and area targets are met.
Figure 2: Properly designed and integrated, large-capacity SRAMs can provide a low-power, economical, high performance alternative to eDRAM.
Considerations when designing large memory blocks
The default approach to implementing a large-capacity SRAM is to build it out of smaller blocks of memory. It is important to note, however, that coupling together smaller blocks to build a large-capacity memory may negatively impact both area and performance if not handled correctly. There are a lot of nets that need to run between and through the building blocks. In order to keep the area under control without using too many layers of metal, and to reduce the negative impact on implementation speed, it is important to keep the inter-block routing uncomplicated. Inefficient wiring also tends to slow down the performance of the memory, resulting in long access times and slow cycle times.
Dynamic and leakage power are particularly important concerns in large-capacity SRAM implementations. The total power dissipated by the combined memory must be less than the sum of the power dissipated by individual blocks of memory, or there may be no benefit over the basic implementation.
As we can see, there are implementation challenges related to timing, power, and area. To make the combined memory implementation worth doing, it must provide benefits in at least two of the three criteria and be on par in the third compared to the baseline implementation.
Next: Solutions
Navigate to related information


resistion
2/8/2013 7:15 AM EST
Very interesting article. But "large-capacity" implying DRAM scale suggests chips which are almost entirely SRAM by area weight.
Sign in to Reply