Understanding Enterprise SSD Endurance
SSD endurance is defined by the measure of the usable life of the flash memory cells typically specified as the number of writes a cell can sustain. ‘Writing’ to a cell requires more electrical charge than ‘reading’ of a cell. When writing to a cell, each cell needs to be erased before it can be written to again. In either case, for every electrical charge that is passed through a NAND flash memory cell as part of a read or write operation, that cell will wear down.
In the enterprise, accelerated access to data is the primary reason SSDs are deployed, and since flash memory cells will be written to multiple times each day, endurance literally determines the reliable life of each drive. Endurance, as well as performance, reliability and availability of MLC-based SSDs are directly dependent on the design of the SSD controller (not the NAND flash memory as many suspect). The SSD controller is the brains and responds to host commands, transfers data between the host and flash media, and manages the flash media to achieve high reliability and endurance. How effectively this controller manages the flash memory will determine whether the SSD can be used in enterprise applications that require 24/7/365 uninterrupted operations under heavy read and write workloads. The real question is can an SSD manufacturer guarantee up to 30 full capacity writes per day for 5 years using MLC media to rival the endurance capability of SLC media.
A Deeper Dive into Flash Media Wear
To store data in NAND flash memory, an electrical charge is placed in the ‘floating gate’ portion of the NAND cell substrate which either blocks or enables electricity flow through the gate. As the NAND cell ages (or cycles), the floating gate will break down as electrons drop out of or get trapped below it. To slow the breakdown of floating gate electrons, which in turn, improves SSD endurance and reliability, enabling technology is available that slows and softens the impact that erase, write and read operations have on NAND flash memory cells. This advanced technology is described later in the article.
To prevent the NAND flash from degrading and adversely affecting SSD reliability, error correction code (ECC) technology is usually employed as a standard feature in most enterprise-class SSDs. The ECC technology enables the built-in SSD controller to detect and correct a limited number of bit errors in each block of data.
At some point, the ECC engine will be unable to correct the bit errors coming from the NAND as it wears out, so when this occurs, the SSD controller performs a read retry (to attempt to read the data again in the hope that the data is read correctly). This double layer of protection enables SSDs to have an exceptional unrecoverable bit error rate (UBER) which enables high reliability. As the NAND flash ages, the average number of read retries required will increase, and this retry will reduce the read performance, as well as the performance of the SSD over time. What is needed is an enabling technology that slows the ‘wear-out’ rate of the flash so ECC and retries do not need to be applied or are not significantly delayed when needed.
In reality, the larger issue in using NAND flash is the higher electrical charge used for the erase operation, and then the write operation, that primarily impacts endurance. To materially increase an SSD’s operating life, more advanced techniques are required.
Techniques such as over-provisioning, throttling, compression, and de-duplication are mechanisms for delaying writes to NAND flash memory and can be effective when deployed, but actual use of these techniques does not increase the number of times to which the flash can be written. As such, these techniques are limited in the gains they can provide. Wear-leveling, for example, doesn’t actually increase endurance, but instead, the flash controller spreads the writing of each data block evenly across all blocks in the SSD device to maintain consistent and even use of the NAND blocks over the life of the drive so that one location doesn’t wear out faster than any other location inside of the drive.