Power Fail Back-up of SDRAM Cache Data and Metadata
During a data transfer operation in an Enterprise storage system, such as reading or writing a location in the Enterprise SSD flash memory, the power supply system to all the components involved – including the storage system Host, SSD controller, SDRAM cache, and NAND flash memory – must operate efficiently to ensure a successful transaction. However, electronic systems are vulnerable to power disruptions like voltage spikes, blackouts, surges, and brownouts. These could result in potential data loss or corruption of:
- Cache data in transit to flash memory
Enterprise SSDs cannot lose data that they have reported as “committed to NAND flash” back to the storage system controller. The enterprise SAS/SATA market has a hot swap specification that requires no “committed” data be lost at any time, even if the power is suddenly cut. An example of this would be an operator error where the wrong drive is ejected during a hot swap servicing session.
There are two mechanisms with which Enterprise SSD controllers report the status of the data received back to the storage system controller. The Enterprise SSD can behave in a “write-thru” fashion, in which the Enterprise SSD controller does not tell the storage system controller that the data and modified metadata are “committed” until it is, in fact, safely committed to the NAND flash memory.
The Enterprise SSD can also behave in a “write-back” fashion, in which some data streams and/or corresponding modified metadata are not yet “committed” to flash, but are reported as “committed” back to the storage system controller. Any data that is reported “committed” back to the storage system controller needs to be made nonvolatile in the event of a power default. Any other data in the Enterprise SSD’s cache is assumed to be lost at power default. The “write-back” approach allows for the random IOPS to be improved significantly over the “write-thru” approach and, therefore, is preferred for high-random IOPS drives.
To ensure proper working of “write back” implementation, Enterprise SSDs contain a power failure detection circuit that monitors the voltage supply and sends a signal to the SSD controller if the voltage drops below a predefined threshold. A secondary voltage hold-up circuit is implemented to ensure the drive has sufficient power for a sufficient duration to back-up the SDRAM cache data. When power is lost, these secondary voltage sources provide the energy needed for the required duration in order to transfer the contents from SDRAM to NAND flash. Figure 2 below shows a typical power failure detect circuit block diagram for Enterprise SSDs.
Figure 2: Typical Power Failure Detect Circuit Block Diagram
The secondary voltage source can be either a high-capacity super capacitor or a bank of discrete tantalum capacitors.
Supercapacitors, also known as ultracapacitors or electric double layer capacitors (EDLC), are capacitors with significantly higher energy density than any other capacitor type available and are used as reliable alternatives to batteries in battery back-up applications.
However, supercapacitors have reliability problems: they suffer from a known set of deficiencies with regard to long-term reliability, much like aluminum electrolytic capacitors. Supercapacitors have a limited service life, as electrolyte dissipates over time and operating temperature from the component, resulting in component wear. The performance of the supercapacitor degrades slowly with electrolyte loss, until the onset of total failure occurs with little or no warning. In addition, the loss rate increases at higher operating voltages, and in higher operating and non-operating temperature environments. For every 10 °C increase in ambient operating temperature, the life expectancy of a super capacitor can be cut approximately in half.
Supercapacitor failure modes are:
- Cell opening due to the electrochemical decomposition overpressure.
- The voltage and the temperature generate a gas pressure inside the cell that slowly increases with the time. When the pressure reaches a certain limit, a mechanical fuse, generally a groove on the can, opens softly.
When used for long periods at high operating temperatures, the moisture of the electrolyte evaporates and the equivalent series resistance (ESR) increases. The fundamental failure mode is the open mode with ESR increase. All supercapacitors come with the warning: “When using these capacitors, incorporate appropriate safety measures in your design, such as redundancy and protection measures”.
Bank of Discrete Capacitor
A bank of discrete capacitors provides a more reliable alternative, but requires more careful design. A discrete capacitor-based voltage hold-up circuit employs a bank of discrete capacitors connected in paralle
l. The discrete capacitors used could be aluminum capacitors, tantalum capacitors, or niobium capacitors. While lacking the compactness of supercapacitors, the capacitance-to-size ratio of a discrete solution takes up significant board space. Tantalum capacitors are also known to be sensitive to shorting and smoking failure mechanisms.
The nvSRAM Solution
The nonvolatile SRAM (nvSRAM) value proposition for Enterprise SDD is to eliminate or minimize the supercapacitor or bank of discrete capacitors, and reliably back-up the in-transit SDRAM cache data and metadata with a single-chip, battery-less, non-volatile RAM-based technology. A brief description of nvSRAM operation is presented below before discussing the specific details of using nvSRAM devices in Enterprise SSDs.