What happens when that initial decode fails? How soft data can be used to recover data on the SSD.
In the previous article in this series, I talked about how we can control the parameters of Low-Density Parity-Check (LDPC) error correction codes in order to manage the latency associated with reads from a Solid-State Drive (SSD). However, we only looked at the latency associated with a single decode of the LDPC codeword. In this post, we will take a look at what happens when that initial decode fails and how soft data can be used to recover data on the SSD.
Hard and soft data decoding
In October 1948, Claude Shannon published his seminal paper “A Mathematical Theory of Communication,” which kick-started the discipline of information theory. The work in this paper is still used today to determine how good or bad an error correction code is because it defined a performance bound beyond which no error correction code can go. Interestingly for any given channel there exists two bounds, one for decoding using hard data and one for decoding using soft data. A few examples of this are given in Table 1 below.
Table 1: The hard-data and soft-data decode limits for a range of code rates. The bit error rates were calculated using hard slicing and the underlying channel was assumed to be Additive White Gaussian Noise (AWGN).
When we perform a read from NAND flash we typically only get back the ones and zeros associated with the data on the flash. As such, an initial LDPC decode of that data is a decode based on hard data, since it has no knowledge about which of those ones and zeros are good and which are dubious (there are a few tricks that people play here, but for now let’s assume my statement is true). In this case, the performance of that initial decode is limited to the second row in Table 1.
LDPC and soft-data decoding in NAND flash
If we assume that a hard-data LDPC decode fails, then things start to get very interesting for the SSD. We could decide to return an “Unrecoverable Read Error” and tell the user that their data is lost forever, but end-users typically don’t like that ;-). If the SSD has an internal RAID system, we could use it to attempt to recover the user’s data at the expense of additional complexity and NAND capacity to calculate and store the RAID parity. However, with LDPC, there is a third option, which is to soften the data and attempt a soft-data LDPC decode. Note that this third option is not available in controllers that use less advanced error correction (for example BCH codes) because they cannot leverage soft information. This option allows us to move from the second column in Table 1 to the third column and operate in more noisy environments.
I like to think of a soft-data LDPC decode in three parts:
- A re-read strategy.
- Soft-data construction.
- The soft LPDC decode.
Let’s look at each one of these in turn.
Re-read strategy: The re-read strategy consists of reading one or more sections of the flash to assist in the construction of soft data. There are a lot of different options here, both in terms of which section of the flash to read and in terms of how those reads are performed. What we are trying to do is maximize the mutual information of the read and the original hard data in order to generate the best soft data we possibly can. Some examples of a Re-Read Strategy might include:
- Read the same section as the original hard data but use a different set of read threshold voltages inside the NAND.
- In MLC NAND, read the section that shares the same word-line as the original section.
- Read the section that corresponds to the dominant disturber. This is the section that, when programmed, has the strongest program disturb impact on the original hard-data section.
There are pros and cons to each of these Re-Read Strategies and, in fact, the three can even be combined together if desired. Just remember that each time you read from the flash you will incur more latency!
The soft-data construction: Each of the reads in our Re-Read strategy returns the 0 and 1 data associated with that read. Although slightly more advanced multi-bit reads might exist in more advanced NAND, we will ignore that for the purposes of this article. Therefore each read of the flash gives us one more bit of physical information. However we need to map these physical zeros and ones into soft information for the LDPC decode. This mapping requires an understanding of both the NAND flash and the LDPC codes that are being used.
Here’s a very simple example to illustrate: For a hard-data decode we can use a very simple mapping to convert the zeros and ones from the flash into information the LDPC decoder can consume. We call the output of this mapping Log-Likelihood Ratios (LLRs). This mapping is given in Table 2.
Table 2: The mapping from NAND Flash value to LLR for a simple hard-data decode.
Now assume our Re-Read strategy consists of one additional read. Our soft-data construction might look something like as shown in Table 3.
Table 3: The soft-data construction table for a simple example involving two reads of the NAND Flash device.
The soft LDPC decode: The final step in the soft-data decode involves passing the LLRs for each of the bits of the codeword into the LDPC decoder logic. The hope is that this decode will be more successful than the original hard-data decode was and the SSD will now be able to return the user data and perhaps move that data to a safer region of the SSD so that it can be recovered more easily the next time it is requested.
Given the characteristics of the latest generation of NAND technology, LDPC engines will be required to meet an acceptable bit error rate. We’ve seen there are side-effects of using LDPC, and leading controllers that utilize the latest LDPC technologies and algorithms will be able to more effectively extract commercial value from the NAND devices they manage.
-- Stephen Bates (@stepbates) is a senior technical director in the Chief Strategy and Technology Office of PMC-Sierra. He works on issues related to NAND flash and other Non-Volatile Memory technologies and the implications of those technologies on storage architectures. He holds a PhD from the University of Edinburgh and is a Senior Member of the IEEE.