Increasing DRAM capacity in memory-hungry enterprise platforms with RDIMM modules is often the quickest, most efficient and least expensive way to boost server performance. If a server does not have enough installed memory to sustain the application, its processors need to compensate by resorting to orders-of-magnitude-slower hard drive storage or solid state drive storage (see figure 3). This approach significantly extends the computation time, even if only a small percentage of memory transactions must be diverted to the storage drive.
Consider a system with the following parameters:
- SSD storage drive average transfer rate 100 Mbytes/s
- DDR3 memory, 1333 Mbps, 64-bit data bus with transfer rate of 10 GB/s (100 times faster than drive transfer)
- DDR3 memory installed on server: 48 GB, 12 UDIMM modules with 4 GB each
- DDR3 memory required by the application: 64 GB
The result is that the system has to use a 16-GB storage drive memory to meet application demand for 64 GB. This virtual memory of 16 GB slows down the whole system. Because the storage drive speed is 100 times slower than DDR3 memory transfer, 25% of data transfer (16 GB out of 64 GB) via storage drive takes 100 times as long as the same 16 GB data transfer via DDR3 memory. Overall, 64-GB memory system performance is slowed down 25 times (i.e., 100 times 16 GB/64 GB). Hence, diverting 25% of memory transactions to drive storage will introduce up to 25 times longer delays in the data transfer.
By upgrading the system to 64 GB memory with eight 8-GB RDIMMs and removing the memory limitations of ECC UDIMM, the system does not need to reroute DRAM data into the storage drive (see figure 4). This improves system performance by up to 25 times. Furthermore, optimized DRAM capacity helps save power, since storage drive transfers, besides being much slower, require much higher wattage per each gigabyte of data transfer.
Figure 3: CPU to memory and storage drive transactions flow diagram
Improving reliability and system uptime with RDIMMs
In the enterprise equipment space, there is a strong focus on memory RAS requirements, which can drastically reduce system downtime and repair costs. Selecting the right memory module can make a significant difference on available RAS level. ECC UDIMMs provide a limited reliability and are known to cause data corruptions and system crashes due to single bit errors and single DRAM failures. RDIMM modules offer a comprehensive RAS solution including parity and availability of extended ECC, which minimize these issues. Table 3 shows a comparison of RAS features in ECC UDIMMs and RDIMMs.
Table 3: Comparison of RAS features in ECC UDIMMs and RDIMMs
Single-bit errors are a key failure type in DRAM communications. A two-and-a-half-year study of DIMM modules on tens of thousands of Google servers found DIMM error rates to be hundreds to thousands of times higher than thought—
a mean of 3,751 single bit errors per DIMM per year .
Single-bit errors can occur on 64 data and 26 address and command lines interconnecting DRAMs and the memory controller. ECC UDIMMs can only detect and correct single-bit errors on 64 data lines via ECC feature. If a single-bit error occurs on any of the 26 command or address lines, ECC UDIMMs will not detect nor report these errors. This error detection gap in almost one-third of DRAM interconnects can produce multiple corrupted memory operations per year, causing severe corporate data losses, service interruptions, server crashes and repair costs.
RDIMMs provide protection against single-bit errors on data, address, and command lines. Indeed, RDIMMs use an ECC feature for correcting data errors and a parity feature for detecting single errors on address and command lines. If an address or command signal has an issue, the RDIMM sends a parity error signal back to the memory controller. The controller can then log the event and initiate a corrective sequence such as resending the last command.
Another advantage of RDIMMs is that they can support extended ECC, also referred to as ChipKill or Chipspare . Extended ECC keeps system operation at full speed even if there is a single DRAM chip failure or multi-bit errors from any portion of a single memory chip. Together, parity checking and extended ECC minimize system downtime, reduce service times and make equipment using RDIMM modules much more reliable than those using ECC UDIMMs.