By the year 2005, applications such as games, high-end graphics, and routers will require chip-to-chip speeds of 10 to 100 GBytes/s, while IC I/O-related cost and power consumption must remain roughly at the levels we see today. Unfortunately, incremental developments of today's technologies would be unable to achieve that performance/cost ratio. Today we can achieve 4.25 GBytes/s data rates using a 128-bit wide 266-MHz SSTL bus with DDR-SDRAM. However, to increase that to 34.2 GBytes/s would require a 512-bit bus operating at 533-MHz. Obviously, a new interface is needed - one that offers a quantum leap in performance while staying within low cost high-volume manufacturing constraints.
Next generation memory signaling technology for graphics, consumer, and networking applicationsHigh performance at low cost
3.2GHz data rate with roadmap to 6.4GHz
Enables 10 to 100 GBytes/s memory system bandwidth
Figure 1. DDR Trace Length Matching Difficult
128 traces to match across 4+ devices means the total board area is 4.5 times greater than the DRAM footprint
Some alternative approaches being considered for chip-to-chip communication - such as Motorola/Mercury's RapidIO and AMD's HyperTransport - use a source-synchronous handshake in which all I/O drivers and receivers transfer and receive data with a timing reference, such as a strobe or clock, that is transmitted with the data. The combined total of the absolute values of driver data-valid time, receiver setup/hold time, clock jitter, data jitter, and data-to-strobe timing error cannot exceed the clock cycle. Unfortunately, it is difficult to proportionately reduce these timing parameters with the clock cycle, especially across interfaces wider than 16 bits. Packetizing the data also increases latency.
PCI Express and Infiniband represent another approach: multiplexing clock and data signals. Multiplexing is able to achieve data frequencies above 3 GHz, and is useful for longer distance backplane applications, but this approach has its limitations for short distances. For example, the 8b/10b data/clock encoding involved in multiplexing increases power consumption as well as the silicon area necessary for I/O. Multiplexing data and clock also precludes bi-directional communication on the same line, and can result in additional latency.
For their next generation parallel interface technology Rambus engineers developed Differential RSL (DRSL), a 200mV differential signaling solution; that is, plus-minus 100mV centered around a 1.1V reference level. To allow for high data rates, bi-directional bit pairs are connected point-to-point and are terminated on-chip with a 50-ohm impedance. Unbalanced transmission was rejected because it inevitably entails high simultaneous switching noise, excessive ground bounce, common-mode noise between pins and between routes, crosstalk, low noise margins and high EMI.
Differential signaling solves those problems, but doubles the number of pins per bit. Interestingly, however, pin-counts on high-density, high-performance chips are about the same, regardless of whether one uses single-ended or differential I/Os. In a high speed IC that uses single-ended signaling, a ratio approaching 1:1 active pins to power and grounding pins is necessary to resolve the noise issues associated with unbalanced transmission. However, because return current is always confined to the bit pair in a differential interface, the number of power/ground pins can be reduced. Overall, the total number of pins remains roughly the same.
Figure 2. Differential Rambus signaling level (DRSL)
Future memory buses will reflect a 200-mV voltage swing (lower means faster)
Another advantage of Yellowstone is that data transfer occurs at eight times the speed of an external clock, resulting in Octal Data Rate (ODR) operation. The I/O circuit of each IC that receives the external clock uses a PLL to generate a 4X internal clock, and data is keyed to both rising and falling edges, so that eight bits per pin-pair are sent or received for each cycle of the clock. Address and control signals are sent from controller to memory chip synchronously.
Figure 3. Octal Data Rate (ODR) is eight bits per clock
The I/O circuit of each IC uses a PLL to generate a 4X internal clock; data is keyed to both rising and falling edges, so that eight bits per pin-pair are sent or received for each cycle of the clock.
Rambus engineers developed several proprietary technologies to deal with ever-decreasing timing margin. Chief among them is a handshake approach called "FlexPhase" in which different ICs may be synchronized to the edges of clocks having different timings by using a per-pin phase adjustment. In fact, even different pins within the same IC need not be synchronized to the edges of the external clock. This maximizes Yellowstone's controller timing margins, making high-speed signals easier to capture.
Using FlexPhase technology, all of the I/Os on the master-side IC are equipped with phase adjusters for transmission and reception. Data send/receive timing may be freely adjusted in increments of approximately 1.4 degrees over a 360-degree range relative to the internal clock edge received by each pin. To reduce chip cost on the slave side, these ICs (generally DRAMs) may opt not to have such phase-adjustment circuits.
Figure 4. FlexPhase provides flexible phase relationships
Send/receive timing may be freely adjusted in increments of approximately 1.4 degrees over a 360-degree range relative to the internal clock edge received by each pin.
During power-up, all of the pins on the master-side IC perform a phase scan by carrying out dummy send/receive operations. They then determine the phase relative to the clock edge that will maximize the timing margin for each bit, and set the transmission timing adjustment and receive timing adjustment registers to that phase. As a result, all I/Os can transmit and receive at the time that produces the maximum timing margin. In effect, during initialization, the data valid window of each pin in the actual mass-production system is determined, and then data is transmitted and received at the optimum timing.
Figure 5. FlexPhase Makes Memory Systems Fast
Each data pin has unique transmit and receive phase (which compensates for PCB and package delays). The phase values are determined at power-up. The transmit phase is set to deliver writes in quadrature with sample clock at receiver. The receiver phase set to sample read data at center of data eye at time of arrival.
This has some remarkable benefits. Skew virtually ceases to be an issue. The engineer who designs the system boards can almost forget about equivalent-delay routing or trace length matching - the lower speed clock and control signals still need to be delay matched, but not the data signals. Moreover, the absolute value of the propagation delay between IC chips can exceed the data cycle time. Another significant benefit of FlexPhase is that it eliminates the need for precise on-chip clock matching, making things easier for the chip designer.
Yellowstone also includes technology to maximize voltage margin. Already, in some of today's Rambus devices, Rambus Signaling Level (RSL) logic calibrates the voltage amplitude and dv/dt once every 100ms during system operation. To maximize the voltage margin in Yellowstone devices, during system operation the IC dynamically controls the on-chip termination resistance as well as the voltage amplitude and slew rate.
Figure 6. The move from single-ended signaling to differential
Limitations include: Simultaneous switching noise (SSN) due to di/dt, Common mode noise and Cross talk between traces. There is only a narrow voltage margin between the signal current loop and the switching noise.
Figure 7. Interface Technology Map
In summary, Yellowstone is a low-cost, parallel, chip-to-chip interface for data transfer between CMOS logic and/or memory chips on a PC board. It employs DRSL, offering a significant speed and power enhancement over other low-amplitude differential interfaces. Two of the most important characteristics of Yellowstone are FlexPhase circuits for precise on-chip skew control and dynamic on-chip voltage and termination control.
Initial Yellowstone-enabled products, available in the 2004/5 timeframe, will operate at 3.2GHz data rates. Over time, the data rate is expected to evolve to 6.4GHz. These data rates correspond to 12.8 to 25.6 GBytes/s for 32-bit bus widths, and 51.2 to 102.4 GBytes/s with 128-bit buses. #
Figure 8. The FlexPhase Test Chip:
Large data eye observed.