Onyx gets its NV memory
PCM devices from Micron were the stars of a paper  presented by members of the Department of Computer Science and Engineering at the University of California, San Diego at the recent Hot Storage conference .
They described the results of testing a PCM-based solid state drive (SSD) called Onyx. This SSD uses what are described as Micron’s “first generation” P8P (90nm) 16 MB PCM devices. Onyx has a capacity of 10GB organized in 8 banks of 1.25GB, connected to a host system by a PCIe bus. Data storage is allocated 8GB of storage with 2GB of storage for error correction. Figure 1 is a schematic of the high-level architecture of Onyx.
Some concerns were raised that Onyx may not have been fully populated, the system requiring some 640 PCMs each with16MB capacity. We have now had assurances from Adrian Caulfield, one of the authors of the paper , that the system was fully populated, with all its 16 x 40 PCM DIMMs. It represents the largest collection of PCMs that has been subjected to the rigors of assembly and shown to the public in a system; this is a significant PCM milestone.
Onyx as a prototype system is based on the design of Moneta, an SSD that was designed in anticipation that at some time, some type of non-volatile memory would become available. It uses DRAM in place of PCM. Onyx now uses PCM in place of the DRAM, but it retains the highly-optimized software stack of Moneta to minimize latency and maximize concurrency.
In essence, the Onyx architecture employs eight memory controllers, each controlling 1GB memory and linked on 4GB/s ring communicating with the “brain” of the system that interfaces with the PCIe bus. The prototype system employs four FPGAs ring connected, with four DIMMs to each FPGA. The system clock frequency is 250MHz. Each DIMM has 40 of Micron’s 16 MB P8P PCM devices. The DIMMs fit into a standard DIMM slot
Some of the techniques for dealing with PCM design challenges, “its own idiosyncrasies” , are worth commenting on. The first is the use of a “large capacitor” to assure that PCM does not breach the fundamental definition of a NV memory, i.e. it does not lose data in the event of a mains failure. The use of the large capacitor is not quite as bad as it might at first appear. The PCM controller is able to provide two indications of the write to PCM status. One is called “late completion,” indicating write is complete. The other, called “early completion,” is provided when all the data is in the PCM buffers. Early completion is used to allow Onyx to hide most of the write latency but is vulnerable to power failure. In the event of a mains failure, the large capacitor has enough power to complete the write operation. The position is defended on the basis that flash can achieve this. It is claimed the use of early completion provides a peak bandwidth per PCM DIMM pair of 156 MB/s for read and 47.1MB/s for writes.
The next PCM “idiosyncrasy” design challenge with which the U of Cal team had to deal, is PCM wear out . They cited discussions with the PCM manufacturer explaining the difference between lifetime of a PCM and flash. Simply put, the PCM lifetime, 1 million cycles, is an estimate of the number of programs per cell before the first bit error occurs in a large population of the device (no population number provided) without error correction. While for flash, lifetime is the number of program/erase cycles before the error-correcting scheme can no longer handle the problems.
To deal with the write lifetime and wear out problem, Onyx employs what is claimed as the first real-system implementation of a “start-gap” wear-leveling scheme in order to avoid uneven PCM wear out. In operation, it slowly rotates the mapping between 4KB rows of PCM memory and their storage addresses. If the storage address of row x is n, after some interval it will become n+1 and so on. This does mean that, periodically, memory content must be rewritten. The start-gap interval used was 128. It introduces a new term into the memory lexicon “line vulnerability factor,” as the number of writes to an address before it is rewritten by start-gap. In a system, the trade-off is vulnerability against extra overhead for access and writing.
Testing of Onyx with standard benchmarks against other systems, The FusionIO (ioDrive) and Moneta, showed that Onyx with PCM could outperform Fusion IO for small writes and reads. With early write completion, Onyx write performance improves for both large and small requests. For small 512B requests, it is claimed Onyx can sustain 478K IOPS compared to ioDrive’s 90K IOPS. The 4KB random read time for Onyx is 38us, while a 4KB write requires 179us.
As well as several design challenges that remain for the future, the real problem is again PCM scaling. Admitting that scaling is key to the future, the design team  states “assuming PCM scaling projections hold” PCM storage arrays will be competitive with flash. They then let us into a secret with respect to Micron’s PCM plans. They state “next generation PCM devices will sustain up to 3.76MB/s,” up from 0.5MB. Perhaps this is Micron’s promised 1 or 2 G-bit PCM?