United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Advanced Switch Fabrics Unlock Processors
Vendors have worked long and hard to realize the ideal of a non-blocking high-speed switch fabric that eliminates the scourge of dropped packets. While the challenges have only escalated in sync with increasing data rates, some of the vendors have made impressive progress using advanced algorithms that reflect wildly varying packet-routing philosophies.







CommsDesign


Editor's Note: For a PDF version of this article, Click Here.

Two years ago, the network processor grabbed all the attention; now the switch fabric is forming fertile ground for new ideas. Though it's been only a few years since the idea of a switched backplane became the norm for routers and switches, some believe the idea already is hitting roadblocks in its ability to handle a combination of high port counts, higher line speeds and quality-of-service (QoS) pressures.

Granted, the telecom slump has doused the market for grandiose switches. But telecom continues to grow, and many industry observers consider it inevitable that routers and switches will grow in size. When that day arrives, the question is whether existing switch fabrics can cope. Many believe that fresh architectures as well as radical algorithmic approaches will be needed. It could be argued that some of the new techniques simply twist existing architectures for the sake of product differentiation, but they nonetheless highlight some of the questions that need to be asked when trying to build a bigger, faster switch fabric.

And the answers can get complex, because any number of problems could crop up. You could concentrate on dropping as few packets as possible, but that might lead to high-priority traffic getting blocked. You could concentrate on satisfying high-priority traffic, but that could starve out best-effort traffic. Either way, you have to prioritize an ungodly number of queues and properly schedule the criss-crossing matchups between input and output ports.

In the ideal output-queued switch fabric, traffic arrives at the input queues, is routed through the switching element and then exits through the egress queues without any delays (Figure 1). Problems arise with traffic jams at egress ports because most switch fabrics break traffic into fixed-size cells. Variable-sized traffic such as Internet Protocol packets is broken down into multiple cells. This simplifies scheduling by ensuring that all ports transmit or receive in sync.

But those cells then have to fight for attention. Every input port can send a cell to the switch fabric at once, but on the receiving end, each output port can send only one cell outside the system. Any other cells have to be buffered. If the number of waiting cells exceeds the buffer size, then any further cells probably get dropped by the system.

Traffic-manager wariness
A traffic manager addresses some of the problems, but some vendors are wary of putting too much reliance on this device, according to Kirby Nell, senior systems architect for startup Zagros Networks Inc. (Rockville, Md.).

Existing high-end switch fabrics have plenty of tricks for trying to avoid dropped packets or at least making sure that high-priority packets aren't the ones dropped. For starters, large switch fabrics need queues in more than one place. Ideally, you'd have queues only on the output ports, but that leaves the switch susceptible to backups if one output port gets too popular. For instance, if every input port sends a cell to Output Port 1 on every clock cycle, that output port will overflow quickly unless the queuing buffer is infinite.

So, queues are needed on the input ports or in the central switching fabric itself. An example is Vitesse Semiconductor Corp.'s TeraStream fabric. Designed for aggregate throughput of up to 160 Gbits/second, it uses multiple levels of scheduling with a store-and-forward architecture in which a set of buffers is on the switch fabric in addition to buffers at the output ports.

"Every crossbar location has a set of priority queues. So, from the ingress side, even if the output port is busy, you can send the frame, and it'll be stored on the switch fabric," said Gary Lee, lead architect on TeraStream.

But if traffic can queue up before getting to the output, then that opens the dread possibility of head-of-line blocking. Suppose a low-priority packet arrives at Input Port 1, followed by a high-priority packet. Port 1 now bogs down, since the high-priority packet can't move until the low-priority packet does-and the low-priority packet might not be going anywhere for a while.

For that reason, most modern switch fabrics use virtual output queues, where each input port has a separate queue for each output port. That prevents head-of-line blocking, but it creates issues in scheduling: For an N port router, it means the system has N x N queues to juggle. To schedule them, most high-end fabrics use a system of requests and grants, the classic example being the iSlip algorithm devised by Nick McKeown, now an assistant professor at Stanford University. Algorithms such as iSlip run multiple iterations of the request-grant process to find the best permutation of input-port/output-port matchups.

QoS tactics
Recently, some companies have begun to claim that request-and-grant schemes aren't adequate for providing QoS guarantees and that the schemes can't provide guaranteed bandwidth to meet service-level agreements. QoS is considered vital these days because it's a potential money-maker for service providers. It opens the possibility of service providers' charging more for favorable service-giving priority to voice and video streams over plain data feeds, for example. Some variations of request-and-grant algorithms try to include QoS accommodations, but they tend to be too complicated to be useful, said Victor Firoiu, principal scientist for Nortel Networks.

The problem, Firoiu contends, is that most architectures are fixated on moving traffic as frequently as possible, keeping up a semblance of perpetual motion until, suddenly, a cell has to be dropped. At this year's Hot Interconnects conference, held at Stanford University in August, Firoiu presented a switch-fabric architecture that uses a feedback loop to avoid congestion in the first place. Firoiu's feedback output queuing (FOQ) architecture notifies the ingress ports of congestion ahead so that the switch fabric doesn't waste resources forwarding packets or cells that will probably get dropped. Separately, the FOQ scheme is engineered so that the request-grant cycle isn't activated every time a new datagram arrives.

"Our idea is to have a less-frequent packet treatment," Firoiu said.

A different feedback approach, from Internet Machines, has the output ports telling the input ports to throttle back to prevent dropped cells (Figure 2). This is handled through the company's own request-grant algorithm, called dynamic distributed weighted fair queuing. DDWFQ has its roots in the available-bit-rate scheme used in asynchronous transfer mode, but it adds some significant improvements. "No one uses it [ABR] in ATM, because in a real network the latencies are so high that it never converges," said Chris Hoogenboom, Internet Machines' CEO.

Under DDWFQ, when an input port starts to receive a data stream, it sends a request to the output port. If the output port has enough capacity to handle the entire traffic stream, it gives the OK, and packets are forwarded as normal.

The interesting part happens when the output port gets overloaded. The egress port will direct the relevant ingress ports to throttle down, sending only a fraction of their bandwidth. And this applies not just to the new data stream but to the streams already in progress.

"We're effectively shaping the rate through the switch fabric so it always matches the capacity of the port," Hoogenboom said.

DDWFQ also prevents starvation, because "we don't offer more traffic to the output port than the [crossbar] can deliver," Hoogenboom said.

All of the scheduling is distributed among the line cards. That way, as line cards are added, the amount of scheduling intelligence is likewise increased, Hoogenboom said. Internet Machines officials are confident that the architecture can scale to very high data rates, because no part of the scheduling is directly affected by the incoming packet rate.

That means line cards are transmitting data but are also sending messages back and forth. The switching fabric knows to give the messages priority. The criss-crossing of messages is treated by the switching fabric entirely separate from data switching, a "fabric within a fabric," Hoogenboom said. Messages are event-driven; that is, the ingress and output ports don't communicate until some change occurs in the system, such as the beginning or ending of a data flow.

Internet Machines avoids using an off-the-shelf crossbar, which would require that a parallel algorithm be run to schedule it. Instead, the company developed a self-routing buffered crossbar. The buffering means the switch fabric doesn't have to schedule every piece of traffic. "We don't have to worry about the micromanaging of every single [data unit] that goes through it," Hoogenboom said.

Avici Systems Inc.'s switch fabric is a slightly different case because it consists of multiple elements that could reside in different equipment racks entirely. But most of the concepts are the same, and Avici has added its own twist to the game, in the form of bandwidth sharing.

Put simplistically, Avici divides traffic into two levels—one for strict priority, given to inflexible streams such as real-time video, and a second echelon of traffic that shares bandwidth across the switch fabric, ensuring that all lower-priority streams are attended to.

Such division works well not only because it gives priority to real-time traffic but also because those real-time streams tend to be deterministic. That is, a video stream is likely to consume a consistent amount of bandwidth for its entire duration, simplifying the router's decision about whether to allot the requested bandwidth for that stream. If the system can satisfy a real-time stream without starving out any lower-priority sessions, then it can safely lock up the requested bandwidth for the lifetime of that stream.

Avici's TSR router doesn't have the kind of feedback mechanisms Internet Machines and Nortel have-but, then, it doesn't need that kind of sophistication, said Chris Gunner, senior vice president of R&D at the company. The service provider can program the depth of queues in the router's ingress path. If congestion occurs at an output port, the related ingress queues will fill, soon dropping packets if they run out of space.

The mechanism works just fine because it's the real-time traffic that's most likely to run into congestion. And in the case of video or even voice, it's preferable to skip a few frames rather than to allow the stream to keep falling further behind. That is, a queue that's too deep doesn't do any extra good. "You're better off not sending packets after they reach a certain age," said Gunner.

Also, packet-dropping doesn't have to be crude and arbitrary. Algorithms such as weighted random early detection (WRED), which uses probabilities to anticipate any queue overflows, help make the router "more graceful about dealing with overflow," Gunner said.

Related Articles
For more on related topics, see:

  1. "The Switch Fabric Multiservice Dilemma"; www.commsdesign.com/story/OEG20020702S0035.
  2. "Revinventing the Switch Fabric"; www.commsdesign.com/story/OEG20010521S0113.

Craig Matsumoto (cmatsumoto@cmp.com) is a contributing editor at Communication Systems Design magazine.











  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
Federal CTO Sees IT Leading U.S. Out Of Recession
Aneesh Chopra is looking to other CIOs to advise him on fleshing out a more detailed agenda to best serve the president's IT agenda.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About