Steven Tomashot, Senior Technical Staff Member, IBM Microelectronics Division, Essex Junction, Vermont, Subramanian S. Iyer, Manager, System Scale Integration, IBM Microelectronics Division, Hopewell Junction, NY
The incorporation of embedded dynamic random-access memory (DRAM) in a system-on-chip (SoC) environment presents some unique challenges. IBM has developed an embedded DRAM solution for application-specific integrated circuit (ASIC) designs in the 180-nm and 130-nm technology nodes. This solution allows SoC designers to take advantage of the improved density, lower standby power, and reduced soft-error rates of embedded DRAM, as compared to embedded static random-access memory (SRAM), without compromising the performance of the surrounding logic circuitry. Although the die size, logical function, and embedded DRAM memory capacity varies significantly in these designs, the common set of embedded DRAM attributes supports a wide range of applications.
Building a high-performance embedded DRAM macro in a logic technology can provide several advantages. The macro can benefit from the dense and high-speed logic transistors in the peripheral circuits, which contributes significantly (70 percent) to the overall performance of the macro. The technology issue is choosing the appropriate high-capacitance storage element to be integrated into the logic process flow.
Traditionally, DRAMs use capacitors with values ranging from 25 femto Farads to 35 femto Farads per cell. This number has remained constant with each generation of technology, and is driven by retention-time and signal-margin requirements. A robust DRAM implemented in a logic technology needs a similar-size capacitor to account for signal margins and manufacturing tolerance.
A deep-trench capacitor can easily be integrated into logic. The trench-fabrication process for embedded DRAMs is identical to the trench-fabrication process used in commodity DRAM parts. The simplicity of this approach is that any high-temperature processes are completed prior to fabrication of the logic devices. The wafer is rendered planar after the trench-fabrication process and is, for all practical purposes, identical to a starting logic wafer. Consequently, the wafer can be processed using a standard-logic process, and the devices are equivalent to devices that are produced in the logic-only process.
Any special features used in the commodity DRAM-manufacturing process, such as the borderless bitline contact, are sacrificed in favor of standard-logic processes. The embedded DRAM back-end-of-line (BEOL) process is identical to the logic BEOL process, and the bitlines use the standard logic M1 level.
The embedded DRAM process adds three extra lithographic masking levels to the logic process. Two of these levels are standard block levels; the third is a critical, deep-trench definition level. The combined process-complexity and process-time adder for these three extra levels is about 20 percent compared to a standard six-level metal, logic-only process.
A customized memory array, optimized for area, performance, input/output (I/O) width, and bit utilization, provides benefits for any embedded-memory subsystem. The initial investment in these custom-memory solutions can be justified for very high-volume standard products. However, designing unique memory arrays is uneconomical in an SoC environment because the design, characterization, and test resources are insufficient for supporting a potentially unlimited number of custom-memory solutions every year. A more economical and resource-efficient approach is to offer a design library that includes multiple memory-macro densities, all having a common memory interface. SoC designers build the memory subsystem by assembling blocks of memory, concatenated in series and parallel, to meet address-depth, I/O-width, and storage-capacity requirements.
IBM's approach in designing flexible-capacity memory macros for the 180-nm and 130-nm technology nodes began by limiting the data I/O width of the memory macros to two widths: 256 bits, and 292 bits for applications that require parity or error-correcting code (ECC). With the data width fixed, the array-addressing requirements were defined for the smallest array building block: one megabit. A row and column addressing scheme was selected (512 row addresses and 8 column addresses) to provide users with two operating modes: a random-cycle access mode, and a faster page-cycle access mode.
Higher-capacity memory macros up to 16 Mbits - can be created by replicating the base 1-Mbit building block, and then connecting all of the replicated memory blocks to a common data I/O bus. To provide additional flexibility and performance, each 1-Mbit DRAM array within a macro is addressed as a bank. This architecture enables support for a bank-interleave mode, which allows multiple memory accesses to occur simultaneously within each macro and increases the data bandwidth of the macro by a factor of three over random-cycle operation.
Each embedded DRAM macro includes built-in self test (BIST), redundancy allocation logic, refresh-control circuitry, and voltage-generation and voltage-regulation circuitry. In addition, each 1-Mbit bank within each macro contains decoupling and redundant elements for array-defect repair. The overhead of including all this support circuitry in each macro has inherent advantages .
*Overall yield of SoC chips can increase because redundant elements are available for each 1-Mb memory bank in each macro.
* In-macro voltage regulation and decoupling can significantly reduce the impact of noise in the surrounding logic on the embedded DRAM macro.
*Availability of the macro memory can improve because the self-contained refresh circuitry allows multiple-chip macros to operate independently.
*Test costs can decrease because the dedicated BIST circuitry in each macro allows multiple macros to be tested and/or operated in parallel during wafer test, module test, and burn-in.
The disadvantage of repeating this circuitry in each macro can be a reduction in density as a function of macro capacity (in terms of Mbit per millimeter-squared). However, using even the smallest 1-Mbit macro provides more than a 1.5 density improvement over an SRAM solution.
Generally, SoC designs can accommodate the periodic refreshing requirements of the embedded DRAM macros as long as the array availability remains above 97-98 percent. Although hidden refresh circuitry can be designed into the embedded macros, the complexity of this logic, and the additional macro area it requires, outweigh the benefits of supporting this option as a standard feature.
The incorporation of DRAM arrays in an SoC design presents some test and repair challenges. The structures used to form the embedded memory cells have evolved from those used in standalone DRAMs. As a result, the embedded DRAM cells are sensitive to the same types of defects and data-pattern interactions that have historically plagued their predecessors. In addition, the drive to improve density and performance in embedded memory arrays ensures that the behavior of the memory arrays will become increasingly complex.
The combination of these factors drives a continued and growing need for sophisticated test algorithms and redundant-array elements to ensure high-quality and competitive cost. As the percentage of memory in SoC designs increases, these elements become even more critical. IBM addresses these issues with an integrated system, which provides a two-dimensional, defective array-element replacement strategy. This technique replaces defective DRAM rows on a one-for-one basis, in classical DRAM fashion. Area-efficient steering logic enables extra data bits to repair defective columns.
Laser-blown fuses within the ASIC assert the spare array elements in IBM's 180-nm and 130-nm technology-node designs. In the 180-nm technology node, banks of fuses are provided in each embedded array, in close proximity to the elements being replaced. This proximity creates some wiring limitations on the die because of laser-window requirements. The 130-nm generation eliminates these limitations because the fuse links are moved to the die edges, and are combined with the fuses used for embedded SRAM repair. A sophisticated fuse compression/decompression macro controls the fuses for both the embedded DRAM macros and the embedded SRAM macros.
The next generation of IBM embedded DRAM, implemented in the 90-nm technology node, enhances repairability by using electrically-blown fuses configured by a fuse controller. In this scenario, all of the embedded SRAMs and DRAMs upload fusing information to the fuse controller immediately following BIST wafer test. A single probe touchdown accomplishes fuse-information upload, fuse-blow, and replacement mapping. Two additional repair sequences are available at subsequent test gates, either at the wafer or the package level.
One touch-down testing has been a goal of the IBM embedded DRAM program from its inception. A processor-based BIST engine provides programmable, at-speed up to 500 MHz for the 90-nm technology node and complex-pattern testing on the same low-cost logic testers used for testing multimillion-gate ASICs. When inserted with wrappers that are transparent to the customer netlist, this testing strategy provides features to ensure diagnosability, defect screening, and test-flow optimization benefits comparable to those achieved using the much more-expensive commodity DRAM testers.
The area efficiency of embedded DRAMs provides performance advantages over embedded SRAMs, particularly when time-of-flight considerations are important. Lower soft-error rates (SER) and standby power also make embedded DRAMs an attractive solution over embedded SRAMs.
DRAMs store a significantly larger amount of charge per cell than SRAMs, resulting in significantly reduced SER per bit. The SER for IBM's embedded DRAMs has been reported at less than one failure-in-time (FIT) per Mb; in contrast, the SER for embedded SRAMs can exceed several thousand FITs per Mb. Although ECC can minimize the effects of soft errors, this solution, if not required by the application, can increase area and cause access degradation.
Standby power becomes a consideration in SoC designs because of the tradeoff between the increased performance of continually shrinking devices, measured by their "on" currents, and the associated increase in "off" currents. The standby power resulting from escalating "off" currents leads to increased power drain, higher device junction temperatures, and can ultimately accelerate chip wear out. Although active power, predominately governed by operating frequency, is comparable for both types of memory, an embedded DRAM solution can draw up to ten times less standby current than an embedded SRAM solution in a given technology node.
See related chart