Design Article
Addressing the challenges of transition to DDR4
Fred Rastgar and Tom Rossi, InStryde LLC
1/22/2013 1:00 PM EST
DDR4 balanced system design
Early estimates project DDR4 power savings of approximately 30% to 40% over the low voltage DDR3 (DDR3L) independent of DIMM bus utilization at the same data transfer rate. Furthermore, at the same power, it is expected that DDR4 can run approximately 1.7X faster than the equivalent DDR3 memory. Finally, these projections suggest that a 4-Gb DDR4 running at its maximum projected data transfer rate of 3200 MT/s can be expected to consume the same power as that of a DDR3 DIMM running at only 1600 MT/s.[1]
While these projections promise a significant DDR4 power and performance advantage over DDR3, in the interim while their transfer rates are likely to overlap, it is quite conceivable for a DDR4 design to provide incremental power savings while delivering lower performance than a comparable DDR3 platform. Given the near-term anticipated cost disadvantage of DDR4 memory subsystems, early adopters should pay special attention to tuning their platform in order to deliver a highly optimized, balanced system.
DDR4 architecture introduces the concept of two or four selectable bank groups. This allows for separate activation, access, and refresh of each unique bank group, improving overall memory efficiency and bandwidth. It is a unique feature that can boost the performance of DDR4 platforms. Highest memory throughput in DDR4 can only be achieved through consecutive reads and writes when targeting different bank groups. This allows lower latencies on command-to-command timing (tCCD-S) and faster burst access. Faster access to memory in short bursts can yield better overall memory utilization and power efficiency. The addition of the selectable bank groups provides developers with an opportunity to better optimize their platforms increasing performance while saving power. Memory controller algorithms must be thoroughly validated when interleaving data between bank groups, however, as the risk of collision rises as DDR4 systems pivot between CCD-S and CCD-L.
Consider a DDR4 platform without DDR4 bank group tuning (see figure 4). One can easily observe that memory access is sparsely distributed, with long periods of inactivity where the banks are left on for long periods of time. In this case we found instrumentation to play a key role in assisting designer visualize I/O distribution across banks and identify inefficient memory accesses in order to better tune the platform.

In contrast, we tested the same platform while taking advantage of the performance advantages of bank groups (see figure 5). Leveraging the new architectural enhancement of DDR4 delivered demonstrably quicker memory access.

As DDR3 memory reaches its market peak in terms of design adoption and system performance, DDR4 is poised to offer a transition toward even more powerful applications. With close cooperation from a wide range of expected adopters and supporters, the JEDEC committee and its supporters took into consideration numerous factors in crafting the DDR4 specification. There are numerous DDR4 “knobs” that can be adjusted to tailor its application across a broad range of designs. Determining and confirming those settings for specific applications will be best managed with the appropriate set of development and support tools in order to deliver the best balance between memory bandwidth and application power consumption.
Winning platforms will be designed to deliver maximum performance while optimizing for best power consumption. Taking full advantage of the new architectural enhancements of DDR4 allows developers to deliver highly balanced systems for the new generation of computing and embedded products.
References
1. Intel Corp., JEDEC DDR4 Workshop, October 2012.
About the Authors
Fred Rastgar and Tom Rossi are founding partners at InStryde LLC, an advanced technology consulting company located in the heart of Silicon Valley. Collectively they have over 50 years of technology experience, including over 40+ years combined service at Intel Corp., where they worked on numerous product generations including advanced mobile processors, as well as memory and graphics controllers.
The authors would like to thank Teledyne-LeCroy for access to their Kibra 480 DDR3/DDR4 protocol analyzer and its corresponding DDR4 interposer, used to support the development of this article.
Early estimates project DDR4 power savings of approximately 30% to 40% over the low voltage DDR3 (DDR3L) independent of DIMM bus utilization at the same data transfer rate. Furthermore, at the same power, it is expected that DDR4 can run approximately 1.7X faster than the equivalent DDR3 memory. Finally, these projections suggest that a 4-Gb DDR4 running at its maximum projected data transfer rate of 3200 MT/s can be expected to consume the same power as that of a DDR3 DIMM running at only 1600 MT/s.[1]
While these projections promise a significant DDR4 power and performance advantage over DDR3, in the interim while their transfer rates are likely to overlap, it is quite conceivable for a DDR4 design to provide incremental power savings while delivering lower performance than a comparable DDR3 platform. Given the near-term anticipated cost disadvantage of DDR4 memory subsystems, early adopters should pay special attention to tuning their platform in order to deliver a highly optimized, balanced system.
DDR4 architecture introduces the concept of two or four selectable bank groups. This allows for separate activation, access, and refresh of each unique bank group, improving overall memory efficiency and bandwidth. It is a unique feature that can boost the performance of DDR4 platforms. Highest memory throughput in DDR4 can only be achieved through consecutive reads and writes when targeting different bank groups. This allows lower latencies on command-to-command timing (tCCD-S) and faster burst access. Faster access to memory in short bursts can yield better overall memory utilization and power efficiency. The addition of the selectable bank groups provides developers with an opportunity to better optimize their platforms increasing performance while saving power. Memory controller algorithms must be thoroughly validated when interleaving data between bank groups, however, as the risk of collision rises as DDR4 systems pivot between CCD-S and CCD-L.
Consider a DDR4 platform without DDR4 bank group tuning (see figure 4). One can easily observe that memory access is sparsely distributed, with long periods of inactivity where the banks are left on for long periods of time. In this case we found instrumentation to play a key role in assisting designer visualize I/O distribution across banks and identify inefficient memory accesses in order to better tune the platform.

Click image to enlarge.
Figure 4: Bank state view from protocol analyzer provides high-level abstraction of command density to help identify less-efficient DDR4 memory access where excessive latency between ACTIVATE and READ/WRITE operations can affect performance.
In contrast, we tested the same platform while taking advantage of the performance advantages of bank groups (see figure 5). Leveraging the new architectural enhancement of DDR4 delivered demonstrably quicker memory access.

Click image to enlarge.
Figure 5: Optimized design with lower latency, higher command density with more READ/WRITE operations across bank groups
As DDR3 memory reaches its market peak in terms of design adoption and system performance, DDR4 is poised to offer a transition toward even more powerful applications. With close cooperation from a wide range of expected adopters and supporters, the JEDEC committee and its supporters took into consideration numerous factors in crafting the DDR4 specification. There are numerous DDR4 “knobs” that can be adjusted to tailor its application across a broad range of designs. Determining and confirming those settings for specific applications will be best managed with the appropriate set of development and support tools in order to deliver the best balance between memory bandwidth and application power consumption.
Winning platforms will be designed to deliver maximum performance while optimizing for best power consumption. Taking full advantage of the new architectural enhancements of DDR4 allows developers to deliver highly balanced systems for the new generation of computing and embedded products.
References
1. Intel Corp., JEDEC DDR4 Workshop, October 2012.
About the Authors
Fred Rastgar and Tom Rossi are founding partners at InStryde LLC, an advanced technology consulting company located in the heart of Silicon Valley. Collectively they have over 50 years of technology experience, including over 40+ years combined service at Intel Corp., where they worked on numerous product generations including advanced mobile processors, as well as memory and graphics controllers.
The authors would like to thank Teledyne-LeCroy for access to their Kibra 480 DDR3/DDR4 protocol analyzer and its corresponding DDR4 interposer, used to support the development of this article.
Navigate to related information

