Some seven years after launching development of DDR4, JEDEC has officially released the new standard (JESD79-4). DDR4 features a per-pin data rate of 1.6 GT/s, with an initial maximum objective of 3.2 GT/s. With DDR3 exceeding its original targeted performance of 1.6 GT/s, look for higher performance speed grades to be added in future releases. The DDR4 architecture consists of an 8n prefetch with two or four selectable bank groups, which enables simultaneous activation, read, write, or refresh operations to be conducted in each unique bank group. The standard was also designed to encompass stacked memory, with stacks of up to eight memory devices acting as a single signal load. With the announcement, we thought it was a good opportunity to sit down with Todd Farrell, chairman of the JC-42.3C Subcommittee for DRAM Timing and director of technical marketing at Micron, to learn more.Kristin Lewotsky, editor: First, the assumption has been that DDR4 would offer higher speeds than DDR3, but the nominal speed of 1.6 GT/s has already been matched by DDR3. I assume that eventually DDR4 will offer higher data rates, but right now, today, why should designers choose DDR4 modules?
Todd.Farrell.: DDR4 offers lower power even at the same speed as DDR3. This comes from the lower voltage but also other features we included for power savings. In addition, DDR4 will offer higher density.K.L.: What does the new standard bring to the table and what are the sweet spot applications?
T.F.: Everything from thin tablet-type devices like notebooks in the personal systems space to networking and server space for the high-end. There will be more of a server-oriented module similar to LRDIMM in DDR3. That enables very-high-capacity, high-density memory modules to be done with a buffer between the DDR memories and the host controller to help kick performance up.K.L.: Even with additional features, DDR4 has the same ballout as DDR3—78 balls for the 4 x 8 configuration and 96 balls for the 4 x 16. How did you accomplish that?
T.F.: The data bus I/O is one enabler for the higher data rates. We also included additional training modes for the controller to get better timing margins and to calibrate it in the system. We added an on die data bus reference supply (VrefDQ) for the data bus. Another enabler is a data-bus inversion function that helps limit the number of 1s and 0s that you transition at any one time, which helps reduce the number of simultaneous switching outputs (SSOs) in the system.
We also did a few things differently in the components. The one server module has more of a distributed buffer on it instead of a single buffer. Distributed buffers are much smaller. They’re located on each byte lane of a module as opposed to taking the whole 64-bit bus like the current DDR3 modules do. They still allow the system to scale in performance and have modules for upgradability and different density points.
Image courtesy of Samsung K.L.: How about the use of open drain technology?
T.F.: It allows us to get better performance and it does save some power. DDR2 and DDR3 used a push-pull driver scheme, which has both a pull up and a pull down, and termination wise has a pull up and pull down. With the pseudo-open-drain approach, we still have a driver that goes up and down, obviously, to drive 0s to 1s, but termination wise it only has a pull up—it’s terminated to the (Vdd
Q supply) positive power rail. The advantage is that it allows you to get better timing margins and it also saves power because we’re really only burning power when we drive 0s, not when we drive 1s because it’s terminated high already.K.L.: What else did you do to reduce power consumption?
T.F.: We added some additional optimization for both the output drivers and the termination schemes. Some of that is inherent to the VddQ termination but there are also additional modes that you can run without termination, especially in point-to-point embedded applications, such as tablet applications in which the memory is soldered down vs. applications that use socketed modules (DIMMs). We have also more reduced data bus output driver strengths for those applications which translates to additional power savings.
Another power saving feature is a DLL-off mode so you can run at much slower frequencies and turn some of the clocking off. This allows an embedded or tablet application to throttle way down to low bandwidths when it’s not needed, to save power. When the application wakes up and needs to do something really fast, you can kick it all the way up to the peak data rate—you can throttle back and forth very easily.
There’s also a low-power auto-self-refresh mode that takes our standby self-refresh power way down, similar to what LPDDR does when you’re at lower temperatures. For applications like tablets, for instance, which have to inherently run cooler, that feature is enabled in DDR4.K.L.: The DDR4 standard goes point-to-point instead of in parallel. Does the architecture cause you to sacrifice speed or scalability going forward?
T.F.: We tried to make it as backward compatible as possible but also to give it some legs going forward. It does have optimizations for point-to-point but it’s also designed to be able to do multiple modules in a channel at higher data rates than DDR3. The advantage of going point to point is that it makes the memory channel fairly simple. You can save a lot of space and still get good performance. It’s also less loaded so you can run at a faster data rate and you don’t have as many reflections as opposed to having sockets that may be populated or empty in a multi-channel or multi-drop system that uses modules.
K.L.: Enterprise-level server applications may require digital switches to minimize the number of direct memory channels. Is this likely to compromise speed by introducing latency?
Image courtesy of Micron
T.F.: [Switches are] mainly a requirement for really high-capacity LRDIMM-type modules where they want to keep the performance of the data bus and they have a lot of different DDR4 devices behind it. It’s to help with the channel skew and the loading. The latency is on the order of nanoseconds, or single clock cycles, an impact similar to any of the buffers or registered-type modules that servers have today.K.L.: It’s an engineering truism that you don’t get something for nothing. What is the trade-off How do you fit it all into one module?
T.F.: It’s just a timing factor. With DDR4, we are taking advantage of our standard process roadmap that allows us to fit more features into a device that’s still smaller in the end application.K.L.: What are the plans for DDR5? Do you think there is room for another generation or will you run into limitations of the basic physics?
T.F.: We’ve been questioning that since DDR started and here we are several generations into it and it’s still viable technology. We’ve already started working on what’s after DDR4 and I think we all see the light at the end of that tunnel and expect another standard to follow on to that.
Did you find this article of interest? Then visit the Memory Designline
where we update daily with design, technology, product, and news
articles tailored to fit your world. Too busy to go every day? Sign up
for our newsletter to get the week's best items delivered to your inbox.
Just click here
and choose the "Manage Newsletters" tab.