As FPGA designers strive to achieve higher performance while meeting critical timing margins, one consistently vexing performance bottleneck is the memory interface. Today's more advanced FPGAs provide embedded blocks in every I/O that make the interface design easier and more reliable. These I/O elements are building blocks that, when combined with surrounding logic, can provide the designer with a complete memory interface controller. Nonetheless, these I/O blocks – along with extra logic – must be configured, verified, implemented, and properly connected to the rest of the FPGA by the designer in the source RTL code.
But, what if these difficult tasks were taken care of by the FPGA vendor? What if a designer could simply use a GUI to input the memory system parameters and generate RTL code without writing it from scratch? Finally, what if the physical layer interface was based on hardware verified designs? All this is now possible using the Memory Interface Generator (MIG) from Xilinx. This "How To" article will discuss the various memory interface controller design challenges and how to use the MIG to build a complete memory interface solution for your own application on a Virtex-4 FPGA.
Memory trends and design challenges
In the late 1990s, memory interfaces evolved from single data rate SDRAMs to double data rate (DDR) SDRAMs, the fastest of which is currently the DDR2 SDRAM running at 667 Mbps per pin (where "Mbps" stands for "megabits-per-second"). Present trends indicate that these rates are likely to double every four years, potentially reaching 1.6 Gbps/pin by 2010.
This trend presents a serious problem to designers in that the data valid window – that period within the data period during which Read Data can be reliably obtained – is shrinking faster than the data period itself. This is because the various uncertainties associated with system and device performance parameters, which impinge upon the size of the data valid window, do not scale down at the same rate as the data period.
This trend is readily apparent when comparing the data valid windows of the earlier-generation DDR SDRAMs running at 400 Mbps and the current DDR2 memory technology which runs at 667 Mbps The DDR device with a 2.5 ns data period has a data valid window of 0.7 ns, while the DDR2 device with a 1.5 ns period has a mere 0.14 ns. Clearly, this accelerated erosion of the data valid window introduces a new set of design challenges for the FPGA designer that require a more effective means of establishing and maintaining reliable memory interface performance.
The challenge to keep pace with the increase in data rate is compounded by the expansion of the data buses employed by these high-performance memories. Wider buses at these speeds require more bandwidth, making chip-to-chip interfaces all the more challenging. The designer must therefore resolve a new and more problematic set of signal integrity, I/O placement and board routing issues.
Along with the performance issues that attend the new breed of high-performance memories, the designer faces a new set of memory controller design issues as well. The complexities and intricacies of creating memory controllers for these devices pose a wide assortment of challenges which, for the FPGA designer, suggest the need for a new level of integration support from the tools that accompany the FPGA.
Memory interface and controller design
There are three fundamental building blocks that comprise a memory interface and controller for an FPGA-based design: the physical layer interface, the memory controller state machine, and the user interface that bridges the memory interface design to the rest of the FPGA design. Customer surveys reveal a consensus that the physical layer interface (comprising the Read and Write interface logic and I/O blocks) is one of the most challenging parts of the overall design.
Memory interface clocking requirements are typically more difficult to meet when reading from memory, as compared with writing to memory. This is because the DDR2 SDRAM devices send the data edge-aligned with a non-continuous strobe signal instead of a continuous clock. For low-frequency interfaces up to 100 MHz, Digital Clock Manager (DCM) phase-shifted outputs can be used to capture Read Data.
Capturing Read Data becomes more challenging at higher frequencies. Read Data can be captured into Configurable Logic Blocks (CLBs) using the Memory Read Strobe, but the strobe must first be delayed so that its edge coincides with the center of the Data Valid window. Finding the correct phase-shift value is further complicated by process, voltage and temperature (PVT) variations. The delayed strobe must also be routed onto low-skew FPGA clock resources to maintain the accuracy of the delay.
Traditional method for Read Data capture
The traditional method used by FPGA, ASIC, and ASSP controller-based designs employs a phase-locked loop (PLL) or delay-locked loop (DLL) circuit that guarantees a fixed phase shift or delay between the source clock and the clock used for capturing data. The designer inserts this phase shift to accommodate estimated process, voltage and temperature variations. The obvious drawback with this method is that it fixes the delay to a single value predetermined during the design phase. Thus, hard-to-predict variations within the system itself caused by different routing to different memory devices, variations between FPGA or ASIC devices, different data strobe (DQS) signals and changing ambient system conditions (i.e., voltage, temperature) can easily create skew whereby the predetermined phase shift is ineffectual.
1. The traditional fixed delay Read Data capture method is prone to errors.
(click this image to see a larger, more detailed version)
Traditional techniques have allowed FPGA designers to implement DDR SDRAM memory interfaces. But very high-speed 333-MHz DDR2 SDRAM and 300-MHz QDR II SRAM interfaces demand much tighter control over the clock or strobe delay.
System timing issues associated with set up (leading-edge) and hold (trailing-edge) uncertainties further minimize the valid window available for reliable Read Data capture. For example, the 333-MHz (667 Mbps) DDR2 Read interface timings require FPGA clock alignment within a 0.2 ns window.
Other issues also demand the designer's attention, including chip-to-chip signal integrity, simultaneous switching constraints, and board layout constraints. Pulse-width distortion and jitter on the clock and/or data strobe signals also cause data and address timing problems at the input to the RAM and the FPGA's Input/Output Blocks (IOBs) flip-flops. Furthermore, as a bidirectional and non-free-running signal, the data strobe has an increased jitter component, unlike the clock signal.