The need for huge amounts of external memory in the ingress and egress section of a datapath is becoming a huge headache for engineers. As line rates increase, this external memory adds significant cost and significant latency in a systems data path.
Fortunately, designers have a new option. New switch fabric architectures have emerged that can better handle traffic shaping, scheduling, reassembly, and more. By doing this, these memories eliminate the need for memory in the egress path of a networking box design. In this article, we'll show how the advanced scheduling and other factors help eliminate egress memory in networking architectures.
Understanding the Traditional Switch
The most common switch architecture in the market is a combined input-output queuing (CIOQ) fabric. In CIOQ switches, buffering occurs both at the ingress and at the egress.
Figure 1 shows a block diagram of the common functional decomposition of a CIOQ switching system. The blocks are shown in Figure 1. These blocks exist in most systems. However, the way they are partitioned varies greatly between designs. In the discussions that follow, we'll focus on the traffic managers and fabric access modules in both the ingress and egress directions.
Figure 1: Block diagram of a CIOQ switching system.
In a typical networking box architecture, the ingress traffic manager performs three main functions:
- Buffering: Storing data that is destined to oversubscribed ports. This memory needs to hold about a round-trip network delay (approx. 100ms) worth of ingress traffic. For example, ingress rate of 40 Gbit/s and round trip delay of 100 ms requires the storage of 512 Mbyte of memory.
- Queue management: Typically, the memory is organized as multiple (1000s) of queues, based on parameters like source, destination, and service level. The traffic manager may also implement congestion avoidance methods such as early-drop policies (e.g., RED/WRED). In addition, statistics gathering per queue may also be performed.
- Segmentation: A traffic manager that works with a cell-based fabric needs to segment the packets before sending them into the fabric. This traffic manager may need to sequence cells in the context of several packets; that is, cells belonging to different packets may be interleaved.
The ingress fabric access modules dispatch packets/cells from the traffic manager memory into the fabric, based on fabric scheduler decisions. The fabric scheduler takes into consideration ports availability, fabric resources availability, and service class.
In current fabric designs, the service classification is very coarse involving few priorities (e.g., 4) on a logarithmic scale. The fabric scheduler location is specific to the fabric implementation. (For a detailed discussion of fabric scheduler strategies, see Evolving Switch Fabric Scheduling).
The fabric contains the data paths that allow packets to travel between every pair of ports. While data is written to the ingress memory at line-rate, it is read from the ingress memory at a higher fabric-rate. The ratio between these rates is called the speed-up factor and is a dominant factor in determining the fabric characteristics. (Note: For more information on calculating speedup factor, see Calculating Speedup in Switch Fabric Designs).
Looking at the Egress Path
Above, we detailed the tasks handled on the ingress side. Now let's look at the egress path.
The egress fabric-access module extracts packets/cells from the fabric and places them at the egress traffic manager memories. The egress traffic manager then takes the data from the egress memories and sends it downstream at the speed of the network port. Note that data is placed in the egress memory at a rate higher than the line-rate (as determined by the speed-up factor).
Three types of considerations govern the size of the egress memory: rate adaptation, traffic shaping, and reassembly. Let's look at each in more detail.
1. Rate Adaptation: The egress memory performs rate adaptation between the fabric-rate and the slower line-rate. This is typically done via a flow control mechanism towards the fabric, which in turn causes the input ports to slow their traffic rate toward the output. The egress memory needs to absorb the traffic in transit. The amount of in-transit data that needs to be absorbed depends on the fabric-rate and the round-trip fabric delay. For example, with egress fabric rate of 64 Gbit/s and 5 μs, the round-trip delay require 320 Kbit of memory.
2. Traffic Shaping: Network protocols may prescribe a very fine-grained service model. However, current fabric designs provide a coarse service-level granularity, involving few priorities (e.g., 4) classes. The egress traffic manager often includes an additional traffic scheduler that, based on per-flow traffic shaping rules, decides which packet/cell to dispatch down-stream. In order to be effective, the egress traffic manager should stockpile enough packets/cells (several Mbytes of memory) to allow the traffic scheduler to have some meaningful selection.
3. Reassembly: With a cell-based fabric, egress memory is required in order to reassemble the cells back into their original form (typically, packets). A full reassembly process at the output requires the memory to hold at any instant all the partially assembled packets in active reassembly contexts. While dependent on the specific fabric implementation, the number of ports times the number of priority classes typically bounds the number of the active reassembly-contexts. Thus, a fabric with 128 ports and 4 service levels requires memory for 512 x 1500 byte or 512 x 10 Kbyte (depending on the maximum packet size supported). This amounts to a memory size of 6 or 40 Mbit (respectively).
The amount of memory needed for buffering at the ingress (512 Gbyte) dictates a bank of external DRAM chips. At the egress, the overriding memory requirements are for traffic shaping and reassembly, which indicate a need for an external bank of DRAM/SRAM chips.
Removing the Memory
The amount of memory required to develop today's switch architectures provides big headaches for design engineers. Memory is an expensive component, which is a headache in today's price conscious communication sector. Memory also adds latency to a datapath, which causes headaches as boxes push into the OC-192 range.
Fortunately, new fabric options are emerging for designers that will help relieve some of the memory architectures. Specifically, by implementing five key features, new switch fabric architectures can eliminate the need for memory on the egress ports. Let's look at the five features and then examine how egress memory can be eliminated.
1. Scheduled Fabrics The most significant factor in removing the external egress memory lies in the shift to scheduled fabrics. In these pull-type fabrics as opposed to push-type fabrics, data at the input waits to be explicitly summoned by the output into the fabric. The fabric provides means to construct complete image of the input queues that compete for the bandwidth of an output, at the output port itself. The fabric further provides means for an output to summon data from any input and flow.
This one-hop scheduling scheme is quite different from the traditional schemes employed in CIOQ switches. In CIOQ systems, the switch piles two heaps of data, one at the input and another at the output. Scheduling is done in two hops: One scheduler schedules data from the input pile to the output pile, pushing the data into the fabric, in a scheme that may include a small number of priority classes. A second scheduler schedules data from the output pile to the network port, providing the more refined QoS guarantees.
In two-hop scheduling, scheduling over the fabric is coarse and may hide from the second scheduler (at the output) some information. This can cause the traffic scheduler not to have the right packets to choose from, even though it has the right packet selection algorithm.
By implementing the scheduled fabric scheme, the egress traffic scheduler maintains an image of the relevant queues competing for its bandwidth. Data waits at the inputs to be explicitly summoned into the fabric. The output traffic scheduler summons specific data from any input and flow. Thus external egress memory is not needed to handle output-scheduling tasks.
2. Packet-based interface and reassembly of internal cells Several next-generation fabric designs offer packet-based interface. That is, even though the fabric works internally with cells, complete packets are sent to the ingress interface and complete packets emerge out of the egress interface. To maintain the packet-based interface while working internally with cells, innovations in reassembly schemes are required to enable fabrics to do so without the use of external memory.
3. Packet Ordering Another attractive feature of contemporary fabric designs is preserving global first-come, first-serve (FCFS) ordering. Namely, packets/cells arrive at the egress in the order they are admitted to the fabric (even when they are sent from different inputs).
As mentioned above, traffic shaping is one of the most important tasks performed by the external egress memory. By offering packet-ordering capabilities on chip, new switch fabric architecture can pull traffic shaping capabilities internal and thus eliminate the external memory requirements.
For example, the FCFS capabilities provided by new fabric architectures guarantees that read requests to the ingress memory are served at the order that they are issued by the output. Moreover, packets/cells admitted into the fabric are received at the output in the order they were admitted.
4. Bounded fabric delay and low jitter Performance characteristics of new switching fabric designs guarantee bounded fabric delay and low jitter. Fabric delay measures the difference between the time a packet/cell first enters the fabric and the time it leaves the fabric. Jitter is the difference between delays of packets/cells as they cross the fabric.
The jitter and delay guarantees delivered by new fabrics ensure that data read from the ingress memory emerge at the output within bounded, low delay after it is admitted into the fabric. This removes the need for an external memory to perform rate adaptation in the egress path.
The only remaining factor that can hurt the response time to a read request from the ingress memory is the contention of different outputs reading data simultaneously from the same input memory. This contention is known as input blocking (discussed further below).
5. High-speed operation New designs provide high-speed fabric access, at 10-Gbit/s (OC-192), 40-Gbit/s (OC-762), and even higher rates. This means that transferring the same amount of traffic via higher-speed fabric typically takes less time than in slower-speed fabrics. This implies that scaling down of the delays in the fabric, specifically the input blocking delay impact becomes negligible.
The Specter of Input Blocking
As mentioned above, the only remaining factor that can hurt response time in a switch fabric is the contention of different outputs reading data simultaneously from the same input memory. This contention is known as input blocking.
Input blocking has a major impact on the quality of the traffic shaping in system architectures that do not have an external egress memory. Input blocking introduces additional variable delay (in responding to read requests from the ingress) and increases jitter, which can hurt the shaping, scheduling, and re-assembly of data, thus causing slowdowns in the datapath.
Fortunately, there are ways for designers to account for input blocking. A jitter buffer can be employed at the egress to eliminate the additional jitter caused by input blocking. This buffer increases the delay of packets that arrive early in order to lower the variance of the delays introduced by the network. A jitter buffer increases delay proportionally to the jitter levels: the higher the jitter introduced by the network, more delay is introduced to compensate for it.
Input blocking is a statistical phenomenon: for a given number of output ports, there is some probability that different outputs simultaneously want to read data out of a specific input memory. This in turn, causes queues to be built where specific packets/cells stall until the next-in-line packets/cells are admitted to the fabric. While the size of the queues in terms of bytes (number of packets/cells) is the same regardless of fabric speed, the time it takes for a packet/cell to be admitted to the fabric is reduced as the fabric and memory speed increases. Specifically, 10-/40-Gbit/s fabric designs operate 10X to 40X faster than previous generation designs. Thus, the jitter is reduced by 10X to 40X times than in 1G fabrics.
Statistical simulations on 40-Gbit fabrics support this argument. For example, consider a 40-Gbit fabric with 128 ports and an Imix traffic pattern at 100% of the fabric rate. A simulation of this fabric was performed to track the input delay suffered by a specific flow between a pair of input-output ports, at the presence of random background traffic. In this simulation, the average delay due to input blocking is in the range of 0.1us to 0.2us.
Figure 2 shows the cumulative probabilities for input blocking exceeding certain values. As we see, the probability that the input blocking exceeds 2 μs is negligible.
Figure 2: Simulation showing the impact of input blocking on a 40-Gbit/s fabric.
In general, network delay is measured in tens of milliseconds (50 ms). Therefore, the addition of a few milliseconds by the jitter buffer is negligible. Thus, the input blocking cannot bear any impact on current applications that already need to cope with 50-ms network delays.
As designers can see above, using innovation in cells reassembly, fabrics can provide packet-based interfaces while working in cells internally while avoiding egress external memory. Together with that, innovative scheduled fabric schemes eliminate the need to buffer data again at the output in order to provide fine-grained shaping. Thus, designers using these fabrics can build systems using the precise 1-hop scheduling without the need for costly external egress memory.
We conclude by stating that the new architecture does not preclude the addition of an egress traffic manager where absolutely necessary, for example, for specialized applications that support very elaborate traffic shaping schemes.
About the Author
Gabriel Bracha is the verification manager at Dune Networks. He holds a B.Sc in Computer Sciences from Tel Aviv University and a Ph.D. in Computer Sciences from Cornel University. Gabriel can be reached at firstname.lastname@example.org.