Design Article
Programmable logic innovation is overdue
Jack Ogawa, Cswitch Corp
1/27/2009 12:11 PM EST
In his book, Christensen presents a behavioral model where larger, successful companies have problems in discovering and nurturing "disruptive" product technologies that initially do not look attractive, but eventually prove superior.
Looking at Altera and Xilinx, you can see that their products have evolved largely due to Moore's Law, which is a "sustaining" technology in Christensen's model. In other words, the incumbent vendors have relied on process technology advancement to sustain the marketability of their products over time rather than true architectural innovation.
Now, with programmable logic markets such as carrier Ethernet, data centers, and wireless infrastructure being awakened from their post-bubble slumber by the YouTube generation, the time has come for innovation. Packet-based equipment is moving to the next level of throughput (generically referred to as bandwidth), with increasing touches per packet due to security and quality of service requirements.
Unfortunately, equipment companies who relied on programmable logic to provide the flexibility in their hardware during the heady bubble growth of the late 90's are now finding that the 20-year-old FPGA cannot meet the new challenges, even with the help of Moore's Law. Programmable logic applications are evolving, and today's FPGAs cannot service them.
How wide, how fast?
How dramatic is this problem? This sentiment received some attention from Clive Maxfield(1) when Altera announced their Stratix IV family. As noted in Maxfield's article, Altera's 40nm Stratix IV family supports a typical system clock frequency of 350 MHz "across the fabric". While this number is somewhat optimistic (100 to 200 MHz is the number most often cited by designers), it nonetheless is effective in highlighting the problem that many high-bandwidth designers face today:
"So... we have 8.5 Gbps coming in (for a single serial I/O lane). After we strip out the 8b/10b coding we're left with 8.5 / 10 * 8 = 6.8 gigabits per second. If the receiver converts this into byte-wide chunks, we now have 6.8 / 8 = 0.85 gigabytes per second."
Maxfield goes on to note:
"If all we wanted to do was load these values in to an 8-bit register we'd still need to be clocking our register at 850 MHz."
Clearly 350 MHz is much less than 850 MHz. But, you can always make your data processing logic more parallel to meet the throughput requirements, right? In fact, in Altera's case, the logic fabric interface from the SERDES is allowed to be as wide as 40-bits, since it is limited to 250 MHz(2).
So, for a 40G application, you would need 6 channels (6 * 6.8 gigabits per second) presenting a total of 240 bits of data that need to be aligned at 250 MHz as it is routed through the device. So, yes, you can spread things out, but this is a daunting timing closure challenge, to say the least.
Gate efficiency is key
Logic density is another challenge as bandwidth requirements increase. Today's programmable logic is notoriously area-inefficient, making the additional logic required to process more gigabits per second extremely expensive from a power and cost perspective. For example:
10G Ethernet MAC = 10,370 logic elements (4,148 ALUTs(3) x 2.5(4))
40G Ethernet MAC = 41,600 LEs (26,000 LUTs(5) x 1.6(6))
100G Ethernet MAC = 107,200 LEs (67,000 LUTs(5) x 1.6(6))
This implies that every 10 Gbps of Ethernet data terminated requires roughly 10,000 logic elements. A protocol conversion (e.g. 100G Ethernet to Interlaken), which is a common FPGA application, doubles that requirement to 20,000 logic elements per 10Gbps. Therefore, you must consume about 80,000 logic elements simply to support the termination of protocols for a 40G application. This is an expensive proposition, especially when you consider the commonly held belief that programmable gates are 20x less area efficient that ASIC gates.
For an FPGA architect, the ostensible objective of FPGA fabric elegance is gate efficiency. Putting aside exotic process technologies in development, this basically implies embedding more "hard" gates in their architectures. Embedding increases gate density, performance, and power efficiency, which are all desirable effects.
However, the trick to embedding in any programmable device is to make the gates configurable so that they are not locked into a single function. With some creativity and proper scope, this is entirely possible. But, therein lies the "Dilemma": with every R&D dollar at the incumbent vendors being held against a return-on-investment (ROI) metric, it will take a brave soul indeed to argue for innovation that has less breadth than their current products. So, where does that leave today's logic designers?
Specialized, heterogeneous logic architectures can offer designers the cost and power efficiency that they seek while managing development costs and keeping their time-to-market edge. Utilizing embedded application-specific elements that are configurable, these architectures can provide reduced development costs over ASICs and FPGAs by eliminating the timing closure problem of a generic logic fabric and providing ASIC-like performance.
Furthermore, bandwidth bottlenecks can be eliminated by utilizing an interconnect structure that is designed to support the datapath topologies common to a given application. Imagine a densely populated city such as Tokyo with only one choice of travel – surface streets. Yes, they are the most flexible, serving all destinations, but they are inefficient for traveling any significant distance, or for moving volumes of people. Fortunately for Tokyo, they have tailored resources, such as freeways and a train system, each with its own merits. Like Tokyo, new programmable logic architectures will offer density with flexibility at a local level, and high performance and efficiency for traveling from function to function.




ChipSeller
2/11/2009 3:28 PM EST
test
Sign in to Reply