Solid state drives (SSDs) deliver breakthrough performance advantages when compared to traditional hard disk drives (HDDs), delivering orders of magnitude gains in storage IO. But the widespread adoption of SSDs in performance-intensive enterprise datacenters has been impeded by concerns regarding the reliability of the underlying NAND flash memory.
For a better understanding of the factors that impact storage media endurance in enterprise environments, it is helpful to understand the two types of drive-level storage operations – write operations and read operations – and their respective endurance requirements. At the highest level, the endurance of storage media pertains to the number of write operations it can withstand before the integrity of the media begins to degrade. Write cache is naturally the most endurance-demanding operation, with data constantly being written to the storage media. Read cache, though less demanding than write cache, also could require high endurance due to cache management algorithms, which establish and maintain the relevance of cache data.
Flash endurance and SSD endurance
Flash endurance is calculated by the number of times each flash cell can be programmed and erased before it becomes unusable. It is specified in program/erase cycles (P/E cycles). Drive endurance can be defined as the number of times the full media can be written to by the user before the drive becomes unusable. It can be specified in total capacity per life span.
In order to translate flash endurance to drive endurance, we need to consider some basic concepts in SSD system implementation. First, let’s consider write amplification. Write amplification is the ratio between the amount of data written by the user, and the actual amount of data written to the physical flash array within the drive. The main reason for write amplification is the quantization of flash writes into minimum page size. Write amplification can be dramatically impacted by write patterns as well as by implementation of "garbage collection" algorithms, which can reduce write amplification by as much as half.
Another design concept which impacts write amplification is over-provisioning. Over-provisioning is the difference between physical flash array size and the size of media accessible by the user.
Figure 1 outlines the effects of over-provisioning and garbage collection in an SSD. In the SSD’s initial state, the white strips represent the user accessible capacity and the blue strips are the over-provisioned capacity which is not accessible by the user.
In NAND flash, a program (write) unit is a page (multiple cells), and an erase unit is a block (multiple pages). Thus, after a period of operating there will be fragmented blocks in the drive, as we can see in the intermediate state illustration (state 2). Garbage collection will commence in order to defragment the blocks, moving the pages around in order to create fully utilized blocks and minimize wasted storage capacity.
Large over-provisioning increases the number of acceptable fragments per block, which means that less pages need to be copied, yielding a lower write amplification (higher efficiency). The downside of a large over-provisioning is that it requires more physical NAND dies, which translates to a higher overall system cost.
Another method to reduce write amplification is compression of user data.
The disadvantage of compression is that it depends strongly on data patterns, and will not work if data is encrypted at the application level or if it is already compressed by the application (as is the case with images, videos and MSOffice documents).