Since no one questions the performance benefits of embedding large memory blocks in SoCs, the only major issues to address are cost, time to market and design risk. Memory structures that are highly manufacturable and scalable across process generations resolve all three of the major embedded-memory issues.
SRAM and embedded-trench-based DRAM meet these criteria well. SRAM targets high-speed applications, while the DRAM suits high-capacity requirements.
New embedded memory technologies can be problematic because manufacturability is hard to prove in advance. Just making a few test chips and even real-world products does not verify that the process will provide high yields reliably on the next chip.
With the high cost of SoC manufacturing facilities today and market requirements for fast ramp to production, the time from a the first introduction of a process to the full production manufacturing yield level is crucial. As processes have become more complex, the time to full production level has generally been increasing. Using an incremental, modular approach to yield enhancement, Toshiba has seen the time to full production yield level progressively decrease with recent generations of technology. Specifically, the 0.18 micron generation process has almost half the yield ramp time and the initial defect density (D0) of the 0.35 micron generation. This kind of sustained manufacturing improvement is essential if fabs are going to continue down the road toward more complex embedded processes for SoCs.
Semiconductor companies also need to develop modular process steps that can be applied to suit the application. In Toshiba's modular embedded DRAM process the process steps needed to create CMOS logic gates are a subset of the steps needed to fabricate mixed-signal circuitry, which are a subset of the steps needed to fabricate full embedded DRAM chips. If you do not need the DRAM, you simply do not add the extra steps. Yet the DRAM steps are a finely integrated part of the overall process flow that do not disrupt manufacturability when included.
When creating such a process, it is important to develop it as a whole rather than attempting to retrofit memory structures onto an existing logic process. Some of the early attempts at embedded DRAM ran into major problems due to the add-on approach. At that time, DRAM and logic processes had been optimized separately for density or performance, and neither was compatible with the other. Most notably, fabricating the stacked-capacitor structure used in most commodity DRAMs requires high temperatures that over-stress logic structures. For this reason, the more logic-compatible embedded-trench DRAM was proven optimal for mixed embedded DRAM fabrication.
Even though SRAM takes a large amount of silicon real estate compared to DRAM, SRAM is highly manufacturable for several reasons. First, SRAM is well understood, so it provides good yields. Second, SRAM does not need the capacitor required for DRAM and therefore requires fewer fabrication steps.
Third, the ability to add redundant SRAM cells to facilitate repair after processing improves the yield of large SRAM blocks. This approach requires a tradeoff between minimizing the size of the SRAM cores and increasing the area to allow repairs, but the correct tradeoff can improve manufacturability. Toshiba achieves good SRAM yields by using redundancy on blocks as small as 500 kbits.
Because DRAM takes up much less silicon area than SRAM, DRAM offers much smaller die sizes where large amounts of memory are required. The capacitor required for DRAM requires extra process steps, but the cost of the extra steps has become less significant over time. Today, an 11-metal-layer SoC involves more than 20 masks just for the interconnect, in contrast to the handful of masks required for the 3- to 4-metal-interconnect-layer chips of a few years ago. The result is that the extra steps needed for today's embedded DRAM now represent a much smaller fraction of the total mask count.
The real cost of the extra steps has more to do with whether they affect the manufacture and performance of the logic transistors. The embedded-trench-based DRAM has minimal impact as the deep-trench capacitor is fabricated before the logic transistors.
In contrast, stacked-capacitor DRAM structures must be built up in process steps that occur after logic fabrication. Due to the high temperatures involved, it is very difficult to control the performance of the logic transistors. Thus, stacked-capacitor DRAM reduces the manufacturability of the SoC's logic.
Also highly manufacturable is the special-purpose OTP ROM that consists of a simple metal structure constructed with the normal interconnect metal. The structure includes fuses for programming each ROM cell and adds no process overhead. Toshiba supplies OTP ROM as a pre-designed block as large as 1016 bits. The block includes shift registers and control logic to route the voltage needed to blow the fuses. A standard fuse-blowing type tester performs this programming during wafer probing. The OTP ROM is useful for specialized applications such as inserting a unique chip ID into the SoC.
A semiconductor technology that is scalable across process generations becomes more manufacturable over time. For example, Toshiba's 16-Mbit embedded SRAM shows 50.1 and 51.7 percent shrinks from 180 to 130nm and then to 90nm, respectively - approximately linear scaling.
The same size embedded DRAM shows better-than-linear shrinks of 61.2 and 71.1 percent, respectively. In other words, the area required for this DRAM is scaling down faster than the process shrink itself. Meanwhile, the extra process steps needed to fabricate the DRAM are becoming a smaller percentage of the overall process.
Embedded DRAM is thus becoming cost-effective for a wider variety of applications. The timing for this shift is superb since many of today's applications can use the performance boost offered by embedded DRAM. Consider the bandwidth benefits of making your memory up to 512 bits wide, for instance. In the newer technologies (90nm, for example) you have the flexibility of using up to 16 DRAM cores per chip. Additionally, keeping the embedded DRAM signals on chip reduces power consumption compared to discrete DRAM. Small, battery-powered applications also benefit from the reduced footprint of a single-chip design.
Memory technologies other than SRAM and deep-trench DRAM may prove useful some day, but the learning curve will be long. New memory technologies will have a difficult time catching up with the ever-improving SRAM and deep-trench DRAM. Embedded magnetic RAM (MRAM) is showing some promise but is still a generation or so away from a full production-worthy process.
Flash memory is an important system-level building block, but it is becoming less practical as an embedded memory technology because the process currently requires many extra mask steps when integrated with logic. Fortunately, it is practical to use a multi-die packaging technique for adding flash. For example, you can stack a commodity 4-Mbit flash chip on top of an SoC. The stacked-chip approach is amenable to high-volume requirements and costs less than embedding the flash in the SoC logic process.
Today's complex fabrication processes deliver astonishing results, so long as they are asked to deliver the results for which they have been finely tuned. After developing processes for embedded SRAM and deep-trench DRAM over many generations, semiconductor companies can now embed these memories at relatively low cost to fuel a new generation of SoC applications.