United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 


NPUs and fabrics together improve QoS
Print this article Email this article Reprints RSS Digital Edition

EE Times


Technology that uses speeds of 10 Gbits per second isn't just for network cores or backbones anymore. As streaming video catches hold and Gigabit Ethernet penetrates the desktop, applications such as metro edge routers and third-generation gateways must provide full-service, multiprotocol network processing at rates up to 10 Gbits/second. At these speeds, fabrics no longer merely cross-switch traffic but must play an active role in enabling key quality-of-service (QoS) functions such as highly granular flow control, back pressure and efficient transfer of both packet and cell traffic. Simply throwing bandwidth at the problem is not cost-effective for network equipment at the metro core or edge.

Intelligent fabrics can offload processing from a network processor (NPU)-based line card in many ways. When multicasting, such as when streaming video from one port to many ports, an NPU must recreate cells for each instance of the multicast. Alternatively, a fabric using a shared-memory architecture could duplicate a single source by generating multiple pointers, conserving NPU processing resources and ingress fabric bandwidth (see Fig. 1).

Fabrics can also manage fairness for designs using multiple network processors. Using a single processor to manage fairness usually makes that NPU a bottleneck in the system, or causes it to ignore fairness between flows on different processors altogether. But an intelligent fabric can arbitrate global fairness.

NPU/fabric coupling is also critical for efficient congestion control or managing oversubscribed ports, where there is more raw ingress than egress bandwidth. When it becomes necessary to exert back pressure, the more specific you can define the source, the better overall latency you can achieve. Unless the fabric can identify the particular class of service within a specific flow, the fabric must use standard flow control mechanisms that might allow the fabric to only identify the congested egress port. This can create the potential problem of a low-priority flow blocking higher-priority flows on the same port. Additionally, the more granular the flow control, the better you can avoid undesirable packet discard. In-band messaging needs to have the highest priority to achieve low latency and ideally have zero overhead.

Finally, to achieve optimal end-to-end performance, the NPU and fabric need to exchange statistical information, such as frequency of errored cells or signal loss, so they can react to the actual — not simply theoretical — network environment. Depending upon the application, an intelligent fabric and NPU working together can eliminate the need for a separate traffic-manager device.

Compliance vs. interoperability

Just because an NPU and switch fabric share a standard interface such as Common Switch Interface (CSIX), CSIX-over-LVDS (low-voltage differential signaling), or NPSI/SPI 4.2 does not mean they will interoperate with full functionality. For example, the CSIX standard does not provide sufficient inner-NPU communication to support NPU buffering/memory load status, which can help the ingress NPU make intelligent decisions when scheduling ingress traffic. It's important that either the NPU or fabric you select has a configurable interface so you can incorporate any workarounds that might arise when you test at wire speed.

Additional issues arise for "pizza box" designs, where a single board acts as line card, service card and fabric. When the fabric and NPU are on the same board, the fabric interface chip is an extra component you can eliminate by having the fabric talk directly to the NPU or traffic manager. The interface chip, however, may have provided queuing and flow control, as well as arbitrated and synchronized traffic to the strict timing constraints of the fabric. Without the interface chip, either the NPU or fabric will need to provide these functions.

Addressing the direct NPU/fabric interface is the new Advanced Switching (AS) standard from the PCI Industrial Manufacturers Group (PICMG). The AS standard avoids requiring a central crossbar for arbitration by using a shared-memory architecture. Data coming into the fabric is stored in a single memory pool and referenced by pointers in dynamically managed output queues allocated by a memory manager, just like in any store-and-forward packet switch (see Fig. 2). Queues can have various levels of priority and classes of services that can be serviced using sophisticated fairness algorithms.

Given the need for multiprotocol support, the NPU must be able to pass both packets and cells through the fabric. The CSIX interface, for example, is a cell-based bus, and assumes that the NPU or traffic manager will take care of the segmentation and reassembly (SAR) of packets to cells (segmentation) and back to packets (reassembly). Fabrics that support variable-sized cells can optimally pass SARed traffic, but only if the NPU communicates cell-size to the fabric.

End-to-end interoperability

Fabric and NPU vendors may claim interoperability, but it's worth taking a closer look at these claims before committing to a design. 10G buses require clean design, good timing management and signal integrity. Many fabric specs are measured under light loads; however, a fabric's robustness is determined when the fabric is oversubscribed with data moving from any port to any port.

Simulation testing using RTL or timing models only provides a first step to proving interoperability; models are abstractions of designs which speed simulation runs at the expense of design coverage. While pre-silicon FPGAs can demonstrate the capabilities of a device, such interoperability testing is not comprehensive since these often represent only a partial device.

For example, a partial 64-bit CSIX interface bus might have less than 64 bits active or run at a slower rate than the specified 250 MHz. When you bring the design up to full speed, interference between signals might prevent you from increasing the clock to full rate, capping performance at less than wire-speed. Skew in timing between multiple lines might exceed system tolerances when you move to higher frequencies. Additional issues arise when you try to combine multiple FPGAs and ASICs, or if you have to simulate a multichip fabric.

Failure to verify interoperability early in the design process, as well as test all logical functionality, can require an expensive and time-consuming re-spin or board redesign. Reference designs running at full-rate play an important role in verifying true end-to-end interoperability between devices in the early design stage. An important cornerstone of interoperability is the upcoming Advanced Telecom Computing Architecture standard. ATCA defines a chassis electrical interface and form factor, effectively a plug-and-play platform that serves as the base design and glue for various chassis components. Verifying interoperability between an ATCA line card and ATCA switch fabric becomes a straightforward process without requiring the design of an entire system. ATCA places focus on interoperability, not merely compliance, and raises the bar for silicon vendors, allowing developers to select best-of-breed components.

Fabrics used to be the first component engineers specified in a new design. The increased intelligence in fabrics, however, has made selection of NPU/coprocessors and fabric a parallel process so capabilities can be optimally matched. While you can blindly throw data to a fabric, a closely coupled NPU/fabric interface enables a shorter development cycle with faster time-to-market. This produces designs that optimize handling of traffic with better QoS/flow granularity, use fabric bandwidth more efficiently, and provide better overall data throughput while actually offloading processing from the NPU and improving the cost/performance of the overall system.

Micha Zeiger is founder and chief executive officer of TeraChip (Palo Alto, Calif.).

See related chart

See related chart






  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About