As networking designs move past the OC-48 range to OC-192 levels and beyond, memory architectures are becoming a key bottleneck during the system design process. Until now, networking designers have been forced to implement PC-optimized memory devices in networking designs. While sufficient for lower speed boxes, these memory architectures begin to kick out as speeds hit the 10-Gb range.
The fast-cycle RAM (FCRAM) architecture has been engineered with high-speed networking in mind. In Part 1 of this set, we took a detailed look at the FCRAM architecture. Now, in Part 2, we will explore how FCRAM can solve problems being encountered in networking applications, particularly OC-192/10G line cards.
Line Card Memory Requirements
Figure 1 shows a typical high-end Sonet line card design. The various functions of the random access memory typically used in these cards include: CPU data memory, receive/transmit buffer memory, routing (look-up) table memory, and packet memory.
Click Here for Figure 1
Figure 1: Typical OC-192 line card architecture.
The CPU data memory used on the line card needs to quickly buffer and process the data presented to it. However, compared with other memory used in networking applications, the density required for this function is small and the speed is relatively slow. As a result, non-leading edge, high-speed static RAM (HSSRAM) is typically used. One of the more common HSSRAMs is the synchronous pipeline-burst (PB) SRAM, which typically delivers a density of 2 to 4 Mb and a clock speed of 100 to 133 MHz.
Designers, however, are working hard to eliminate CPU data memory from line card designs. Since these memories offer a low density, CPU designers are integrating this memory solution on-chip, thus eliminating the need for a standalone memory IC.
While CPU memory is going away, buffer memory continues to be a key concern for designers of high-speed line cards. Buffer memory in networking applications must support a somewhat balanced number of read and write cycles (whatever is written must eventually be read and vice versa), relatively long bursts (four-word, eight-word and even longer), limited data randomness, and relatively high density. Hence, burst speed, as measured in peak bandwidth, is important, while random cycle latency and bus turnaround time are not as critical. Based on these requirements, SDRAM or HSSRAM are the memory architectures of choice for today's networking equipment, such as OC-12 and OC-48 designs.
Buffer memory for OC-192 has the same functional characteristics, except that the density is typically higher to support more transactions and deeper packets, and significantly higher clock speeds and data rates must be supported. The 10-Gbps optical data rate translates to 156 Mbps across a 64-b memory bus.
An examination of today's memory solutions reveals that not many products can provide an effective bandwidth equal to or exceeding 156Mbps. Clearly, traditional SDRAMs and SRAMs, even with clock frequencies above 156 MHz, will not meet this requirement due to factors limiting their effective bandwidth, as described in Part 1 of this series. Some of the newer memory solutions that appear capable of meeting the requirements of high-performance buffer memory include DDR, QDR, SigmaRAM in the SRAM camps and FCRAM, DDR (clock frequency above 133MHz required), and RDRAM in the DRAM camp.
Of the above varieties, the DRAM solutions, specifically FCRAM, offer lower cost per bit, but cannot match the performance of SRAM in terms of initial access latency. However, since initial latency is less critical than peak bandwidth, and the FCRAM solutions can theoretically match SRAM in terms of peak bandwidth, the key factor in determining which solution to use is density. Since the lowest common FCRAM density is 256 Mb in a x16 configuration, DRAM is the best alternative for buffer memory requirements of 128 MB and higher, assuming a 64-bit memory data bus.
It should be noted that some of the separate I/O SRAM products, such as QDR, do offer higher peak bandwidth potential than DRAM solutions, such as FCRAM, since DRAMs are only offered in common I/O configurations today. However, the selection of DRAM vs. SRAM will ultimately come down to cost/performance tradeoffs. And with buffer memory density requirements continuing to increase, DRAMs may ultimately prevail in high-end systems.
Routing Table Lookup Memory
As opposed to buffer memory in networking applications, look-up table memory needs to support an unbalanced ratio of read and write cycles (sometimes exceeding 10 to 1), relatively short data bursts, a high degree of data randomness, and relatively low density. Therefore, initial cycle latency and bus turnaround time are more important criteria than peak bandwidth or burst speed.
Today's common memory solutions for look-up table memory primarily include traditional synchronous SRAMs, which are characterized by low initial cycle latency compared with DRAMs. Furthermore, many synchronous SRAMs have adopted new features to improve the bus turnaround time.
For OC-192 implementations, one would expect the density of the look-up table memory to increase for many of the same reasons buffer memory is increasing--primarily in support of more traffic and users per line card. Demands for faster bus turnaround capability should also be expected, although this is quite difficult to achieve in DRAM or SRAM products without facing major architecture changes. In fact, many of the new products, such as separate I/O SRAM, can actually increase bus turnaround times, which drastically hurts performance in unbalanced read/write applications.
One memory architecture that does provide very low initial latency combined with fast bus turnaround capability is called content-addressable memory (CAM). CAMs, while having the best performance potential for look-up table memory, are also the most expensive type of memory in terms of cost-per-bit. Therefore, CAMs are relegated to the highest end of the performance spectrum, where density requirements are low and cost is secondary.
Fundamentally, the OC-192 line card designer will have to choose between using a CAM approach, fast bus turnaround SRAM, or FCRAM for look-up memory. The main area where these memory solutions differ is in initial cycle latency, also called random cycle time (tRC). Today's CAMs and SRAMs provide tRC of less than 10 ns, even below 5 ns for some very high-end (and expensive) products, compared with conventional DRAMs, including DDR, in the 60- to 70-ns range. FCRAM bridges this gap with tRC in the 20- to 30-ns range, making it a candidate for higher-density, lower-cost look-up table memory.
Additionally, the cost of SRAM is typically four times that of DRAM, making the density consideration less absolute. For example, if the required look-up table density is 32 MB, two 256-Mb FCRAM devices would provide 64MB total. This is overkill, but it would cost considerably less than a 32-MB implementation using HSSRAM.
Packet memory in networking applications is used to store and buffer packets that are being processed by the line-card CPU or ASIC, as opposed to transmit/receive buffer memory which queues incoming/outgoing packets before they are sent to/from the CPU. The characteristics of packet memory are similar to transmit/receive buffer memory with the notable exception of typically being lower in density and possibly lower in performance, depending on the CPU/ASIC data rate. Hence, SDRAM is typically the memory of choice today, with FCRAM increasingly being considered for tomorrow's applications, including OC-192.
Figure 2 provides a qualitative summary of the main performance characteristics for the major RAM networking solutions, as well as the suitability of each device for the various networking applications. Note that the viable OC-192 solutions are FCRAM, DDR SRAM, and separate I/O SRAM, and that, of these, FCRAM is the only DRAM solution. The advantages of FCRAM over the other DRAM solutions are clear: Higher performance (latency, bus turnaround time and effective bandwidth) with minimal cost increase, yet compatible with DDR DRAM.
Click here for Figure 2
Figure 2: Comparison of the main performance characteristics of memory components used in networking designs.
It is also fairly clear that the effective bandwidth of SRAM will have to significantly improve to justify the 4X or higher increase in cost compared with DRAM. The most effective way to increase the bandwidth is to increase the data rate per clock, i.e., DDR, and/or to have separate ports for writing and reading. Hence, the best SRAM solutions for addressing OC-192 are clearly DDR and separate I/O products.
Separate I/O products are the best fit in buffer memory applications, as the peak bandwidth capability of this architecture can be approached in systems with a balanced number or read and write cycles. On the other hand, the fast initial latency and bus turnaround time of common I/O DDR (usually referred to as simply "DDR") compared with separate I/O make DDR the ideal architecture in look-up table types of applications, which are characterized by a typically unbalanced number of read/write cycles. Therefore, it is expected that common I/O and separate I/O DDR SRAM products will co-exist with FCRAM, while their specific usage will depend on the application. Often times, all three solutions may be used within the same application.
More on Latency
As previously mentioned, the primary advantage of SRAM over DRAM is initial latency. Therefore, DRAM can provide roughly the same effective bandwidth in applications with minimal randomness, such as buffer memory in networking. Comparing common I/O to separate I/O in these same applications (regardless of SRAM or DRAM) will show a clear performance advantage for the separate I/O architecture. Therefore, as buffer memory continues to increase in density with increasing network traffic per line card, the selection of FCRAM or separate I/O SRAM in buffer applications will ultimately come down to cost/performance, rather than simply selecting SRAM because the density requirements are low.
If performance is the primary consideration and cost is secondary, separate I/O SRAM is the correct choice for buffer memory in OC-192 applications. On the other hand, if the performance of FCRAM is adequate, considerable cost savings (in terms of component cost and board real estate) will be realized.
For look-up table memory, the issues are somewhat reversed. Due to the random nature of this application, the initial latency is an important factor, and SRAM has a clear advantage over DRAM in this regard. In fact, CAMs are increasingly being considered and designed into look-up table applications, primarily due to their best-of-class performance in terms of latency.
Another important consideration is that look-up table densities are typically small compared with buffer memory. However, densities are increasing and FCRAM does come much closer to SRAM in terms of initial latency, making FCRAM a contender for OC-192 look-up table memory. Assuming a 64-bit memory bus, the crossover point in terms of cost is around 32 MB. Below 32 MB, SRAM provides the best cost/performance ratio. However, at 32 MB and above, the designers should consider if the performance of FCRAM is adequate.
One last point to consider is the feasibility of combining certain memory functions into one device, which really means eliminating a memory interface/bus. This is certainly a trend that will affect networking applications, just as it has in other applications, such as computers, as their pervasiveness increases. For example, can the transmit/receive buffers be combined themselves or with the packet buffer memory? Or, can the relatively small look-up table and packet buffer memories be combined? If the answer to these questions is "yes" and the performance is acceptable using FCRAM, a considerable reduction in system cost can be realized.
In summary, there are a variety of DRAM/SRAM solutions available for networking applications today, and even more being considered or designed in for next-generation systems. In particular, OC-192 line cards have several unique memory requirements within the same application.
FCRAM bridges the performance gap between DRAM and SRAM, and with its DRAM-like cost structure, will increasingly be considered for OC-192 and beyond, especially as memory size increases and memory functions are consolidated. As a future consideration, one may expect that FCRAM will continue to adopt SRAM-like functionality/features as it continues to bridge the performance gap and eats into these once SRAM-owned applications.
About the Author
Kevin Kilbuck is the director of memory engineering for Toshiba America Electronic Components. He holds a BSEE from California State University, Chico and an MBA from Pepperdine University. Kevin can be reached at firstname.lastname@example.org.