Design Article
Introducing MagnaPHY high-speed chip-to-chip serial interconnect
Clive Maxfield
2/1/2008 4:42 PM EST
Preamble
Do you recall some time ago when I penned an article titled How to turn every FPGA LVDS pair into a complete SERDES solution. This described an interesting technology called Align Locked Loops (ALLS) invented by a company called Align Engineering.
Well, shortly after that I received an email from Mitch Anderson from a hot new company called MagnaLynx saying:
Hi Max, I just read your article on ALLs, and I wondered if you might be interested in looking at our MagnaPHY serial interconnect technology. This is a very similar value-proposition (high-speed, low pin-count, etc.) but more oriented at high-speed embedded applications.
Now, I'm always interested in hearing about new, "hot-off-the-shelf" technologies, so I gave Mitch a call and we set up a meeting. Mitch and his colleagues flew down to see me (they are also amateur pilots), and what they told me was very exciting indeed . . .
Gigabits-per-second-per-differential pair
Let's start with a brief overview, and then plunge down into some nitty-gritty details. First of all, we all know that high-speed serial interconnect is the connection mechanism of choice for today's state-of-the-art designs. There are numerous reasons for this, but the main ones as far as we're concerned here are high-speed, high-bandwidth, and – very importantly – low pin-count. As a simple example, consider a memory chip connected to an FPGA (Fig 1).

1. Using high-speed serial interconnect to link an FPGA and a RAM.
As we see, this ×1 (one-lane) version requires two pins on each device to form the differential pair that implements the transmit path from the FPGA to the RAM; similarly for the receive path to the FPGA from the RAM.
Now, there are a variety of well-known serial interconnect standards and protocols roaming wild and free in the world. For example, when you see the diagram above, your knee-jerk reaction might be: "Ah, we're talking about PCI Express, or RocketIO, or . . ." Not so!
Of course PCIe and RocketIO – along with other high-speed techniques such as 10 gigabit Ethernet (10GbE) – are very powerful, but they were originally conceived almost as full networking protocols for use in system-to-system and board-to-board scenarios. To put this another way, none of these standards was originally envisaged with chip-to-chip communications in mind.
The problem is that protocols like PCIe tend to try to be "all things to all men", with the result that there are substantial overheads associated with these little rascals. Now, a few hundred nanoseconds of latency is typically not critical when we're talking about system-to-system or board-to-board communications, but it can be something of a "knee-in-the-groin" when one is focused on chip-to-chip communications.
Something had to be done, which is where the folks at MagnaLynx leap onto the center of the stage with a fanfare of trumpets. Now, being engineers, they know that no one wants to re-invent the wheel (although, in my opinion, we might go for a more interesting shape next time). In the case of FPGAs, for example, they use the existing hard macros that are employed to implement any of the conventional high-speed SERDES protocols.
For example, the MagnaPHY serial interconnect technology from MagnaLynx uses 10-bit symbols similar to PCIe, but it doesn't use conventional 8b/10b encoding per se.
As an aside. . . One might think of the MagnaPHY encoding as 9b/10b, but this is something of a simplification. Another way to look at things is that if you consider 8-bytes of raw data, plus an additional 8-bits to implement ECC (Error Correcting Code), then you have eight 9-bit fields that would actually be transmitted as eight 10-bit fields. So when the folks at MagnaLynx talk about "Gigabyte Bandwidths", they are talking about working with "nine-bit-bytes" (if you see what I mean). But we digress . . .
Now, where was I? Oh yes. . . One crucial point about MagnaPHY is that it's easy to understand and to implement in your FPGA designs. For example, as illustrated in Fig 2, let's compare the sheer size of the specifications for PCI Express (left) and MagnaPHY (right).

2. Comparing the size of the specifications for PCI Express (left) and MagnaPHY (right).
Let's take a quick survey. Which of these specifications would you like to take home and learn this weekend? Put your hand up if you think PCI Express is the way to go. . . now let's see who thinks MagnaPHY might be just a tad easier to learn . . . well, I think the results speak for their selves.
All joking aside, the fact that the MagnaPHY specification is so much smaller actually does provide an indication as to the efficiency of this protocol. Once again, let's remind ourselves that MagnaPHY is focused only on chip-to-chip communications, which means it doesn't need all of the overheads required to implement a full board-to-board or system-to-system networking protocol.
The result is an extremely efficient, high-bandwidth, low (sub 10 nanosecond) latency protocol that can either enable high-end systems or provide for cost reductions in lower-end products. Furthermore, MagnaPHY provides a realistic path for the Terabit throughputs that we're all going to be demanding in the not-so-distant future.
In my conversations with Mitch and the other guys, they inundated me with technical details, such as the fact that MagnaPHY significantly reduces Bit Error Rates (BER). How? Well, Total Jitter = Deterministic Jitter + Random Jitter. It seems that conventional high-speed interconnects operate with BERs in the order of 10-12 (that's ten to the minus 12) based on +/– 7-sigma values on the distribution curve. Apparently MagnaPHY provides for significantly reduced deterministic jitter values, which allows it to tolerate higher random jitter values, which results in BERs in the order 10–20 to 10–21 (that's ten to the minus 20 to 21) based on +/– 10-sigma values on the distribution curve. I didn't understand a word of this, but it certainly sounded good (and I know – to my cost – that they would be delighted to talk to you about it in excruciating detail)!
Now, one consideration is that FPGAs contain programmable fabric, which allows the guys and gals at MagnaLynx to implement the "Secret Sauce" required to augment the FPGA's hard SERDES macros and implement MagnaPHY. In the case of other devices – such as RAMs – it's necessary to embed some small hard IP.
In the fullness of time, the folks at MagnaLynx would like to see MagnaPHY IP embedded in every chip under the sun. In the shorter term, however, they have created the MagnaLynx ML1S family of high-speed, single-port Static RAM's. In addition to providing a killer "proof-of-concept", these chips are ideal for Communications, Storage, Computing, Test-and-Measurement, and other applications requiring high-performance data buffering with maximum density and minimum power.
Now, I don't want to scare you, but I do want you to know that this is all real (besides, who among us can resist the lure of a real-world test-bench environment). Thus, Fig 3 shows one of these MagnaPHY-enabled memory devices hooked up to an FPGA development board.

3. A test-bench environment showing a MagnaPHY-enabled memory device hooked up to an FPGA development board.
Ah, how this takes me back ... to this morning in my workshop as fate would have it, but that's another story...
Some nitty-gritty detailsSad to say I'm running out of time (as I pen these words) because I will shortly have to run out of the door to attend an impromptu training class on programming in the Python scripting language (also, the folks giving this training are providing Chinese food for lunch), so let me briefly summarize things as follows:
A brief overview of MagnaLynx
MagnaLynx is a fabless semiconductor company focused on extending the benefits of high-speed serial communication to chip-level interfaces. The folks at MagnaLynx currently provide high-speed mixed-signal design services and select IP blocks. Also, they are licensing and developing memory products based on their patented MagnaPHY interconnect technology and associated memory controller.
A brief overview of MagnaPHY technology
MagnaPHY is a low-latency memory interface based on industry-standard high-speed serial technology – clock and data recovered (CDR) timing with binary non-return to zero (NRZ) signaling and utilizing a 10-bit symbol.
By utilizing a 10-bit symbol, MagnaPHY is able to leverage existing "×10" SERDES implementations and IP. The MagnaPHY protocol, when implemented in hard logic, adds the equivalent of only two additional gate delays enabling data access from a fast memory (SRAM) in under 10 ns. Bit error rates of less than 10–20 are achieved through careful management of the jitter budget to ensure a nearly error-free channel.
The capabilities this technology enables
When operating at 10 Gbps, MagnaPHY delivers 9 gigabits per second of useable payload over a single differential pair (2 pins). When compared to existing parallel memory interfaces, MagnaPHY enables much higher memory bandwidth using the same number of pins or the same bandwidth using significantly fewer pins, thereby lowering package size, power consumption, and cost of the host device.
For example, a MagnaPHY 4-4-4 device (4 lanes each of address, read data, and write data) will deliver 72 Gbps of R/W data throughput on 24 active pins (this does not include power and ground.) MagnaPHY can scale to deliver a 1 terabit-per-second memory subsystem utilizing fewer than 350 host pins.
On the smaller and cheaper side of the equation, a MagnaPHY 0-1-1 device (no dedicated address lane, one lane of read data, and one lane of multiplexed address/write data) delivers 9 Gbps useable R/W payload on only 4 pins!
How does this differ from what's already available?
Comparing the performance of a serial MagnaPHY SRAM with a parallel QDR II device clearly demonstrates the advantage of differential signaling. Using the same 4-4-4 device as described above, the MagnaPHY SRAM would deliver 72 Gbps on 24 pins, or 3 gigabits-per-second-per-pin. By comparison, a QDR II device at 400 MHz (800 Mbps) delivers 57.6 Gbps on 100 pins, or 0.58 Gbps-per-pin.
Fully-buffered DIMM (FB-DIMM) is an existing serial memory alternative that has been developed specifically for the Intel server market. FB-DIMM provides for an increase in the density of memory subsystems and results in fewer signal pins on the host processor, but since standard DDR memories are used on the DIMM, the bandwidth of the FB-DIMM channel is equal to the peak bandwidth of a DDR memory channel. In addition, the Advanced Memory Buffer (AMB) increases the latency and power requirements of the FB-DIMM solution making it unsuitable for most embedded applications. Incorporate notion of "lack of scalability" down to the embedded solution (a few devices hooked directly to a controller).
Proposals have also been made to utilize an existing serial protocol, like PCI Express, for memory access. While this has the advantage of utilizing existing host-side interfaces, the PCI protocol specification, which was developed to meet the needs of board-level communications, adds far too much latency to be able to meet the demands of high-speed memory access. MagnaPHY strips out all the overhead associated with traditional serial protocols and retains only that which is necessary for serial chip-to-chip communication. As a result, MagnaLynx is able to demonstrate sub-10 ns memory read-access times (measured at the host).
Rambus XDR is a high-speed DRAM solution optimized for consumer electronics applications. XDR utilizes a proprietary IO cell (XIO) which is not supported in common FPGA architectures. XDR utilizes differential signaling on a bidirectional data bus, while the address and command information is carried on a 12-bit source synchronous parallel bus. An XDR2 interface cell can deliver 16 GB/s sustained read or write bandwidth to a x16 XDR2 DRAM and requiring 51 (16×2 + 12 +7) pins. By comparison, a MagnaPHY 4-8-8 interface (4 lanes of address/command plus 8 dedicated read lanes and 8 dedicated write lanes) will deliver 16 GB/s sustained with no switching penalty on 40 pins (20 differential pairs). (As an aside, the folks at MagnaLynx say that in 2006 they were the first to demonstrate full differential signaling for the entire interface, including address and control.)
Summary
Industry experts predict that a full shift away from single-ended signaling will occur in the 2012 – 2013 timeframe. At that time (perhaps sooner in the graphics market), the need for higher memory bandwidth will only be able to be satisfied with differential signaling technologies.
Thus, over the course of the next few years, we can expect the electronics industry to begin to transition to serial memory solutions in high-volume as well as embedded applications. In the shorter-term (1-to-2 years), very high speed embedded applications – such as core routers and switches – will begin to implement serial memory solutions in order to realize the memory access bandwidth required to support line rates of 100 Gbps and beyond.
The adoption rate of new technology and applications in the consumer market continues to accelerate. The use of serial interconnects in consumer products will most likely occur very quickly on the heels of implementation in high-performance systems. The constant demand for smaller consumer electronics with higher capacity and performance will fuel this rapid adoption of this clearly superior and more efficient technology.
So, the question we are left with is as follows... If someone is looking at migrating to high-speed serial interconnect technology to satisfy the chip-to-chip interconnection cost, bandwidth, and pin-count demands of their next project, would they rather use a technology like PCI Express (incredibly powerful and sophisticated, but perhaps "over-kill" when it comes chip-to-chip communications)... or ... might we perhaps be interested in considering a technology that was easier to understand and implement and offered dramatically lower (sub 10 ns) latencies and Bit Error Rates? Hmmm... let me think...
Clive "Max" Maxfield is president of TechBites Interactive, a marketing consultancy firm specializing in high technology. Max is the author and co-author of a number of books, including Bebop to the Boolean Boogie (An Unconventional Guide to Electronics), The Design Warrior's Guide to FPGAs (Devices, Tools, and Flows), and How Computers Do Math featuring the pedagogical and phantasmagorical virtual DIY Calculator.



