As network-equipment developers move away from internally developed ASICs and toward commercially available chips for managing traffic on the network, merchant-market developers have taken different approaches to incorporating traffic-management functions in silicon.
Software-based methods include writing traffic-management algorithms into a network processor (NPU). Hardware-based methods include using a dedicated, standalone traffic manager, a traffic manager integrated with a switch fabric or a field-programmable gate array-based traffic manager. Although the best approach depends on the application, three compelling reasons have emerged for using a specialized, hardware-based traffic manager, and leaving the NPU to do what it does best packet processing.
Even with continued changes in networking protocols, basic traffic-manager functionality remains well defined. As a result, the programmability of an NPU is not required. Whether performing packet processing alone or also incorporating traffic-management functions, NPUs cannot guarantee line-rate performance, except in less demanding applications. In high-bandwidth applications, or in applications with extensive quality-of-service (QoS) requirements, NPUs can and do fall short of line rate. Dedicated traffic managers are purpose-built to offer a large number of high-speed queues, optimized queue depth and sophisticated scheduling mechanisms mechanisms used to meet the QoS requirements of the application. NPUs are not designed with QoS in mind, and would require excessive processing power and software optimization to achieve what a dedicated traffic manager already accomplishes by design.
No compromises allowed
Implementing a high-performance, feature-rich traffic manager at high speeds is a non-trivial challenge. A traffic manager's function in the packet-processing datapath is to enforce service-level agreements (SLAs), and it employs a variety of sophisticated functions to achieve this goal. These include shaping, scheduling, queuing, congestion control and buffer management.
These functions all need to be accomplished at full line rate under all conditions, while the device runs complex scheduling algorithms. The chip may also have to deal with a large number of weighted random early detection (WRED) drop-probability curves for congestion control. These curves require a significant processing effort for which a hardware-based traffic manager is optimally designed.
See related chart |
When it comes to traffic management, an NPU-based solution (left) can be more expensive and more complex than one based on a traffic manager chip (right).
Source: Teradiant Networks
In addition, traffic management requires a large packet buffer with high memory bandwidth, for example, 512 Mbytes for 400 ms worth of buffering at OC-192 rates, with a usable memory bandwidth of approximately 60 Gbits/second. The hardware in most general-purpose NPUs is not optimized to handle such requirements, meaning that the NPU will force compromises in either speed or functionality if used to implement traffic management. Compromises in speed may involve reducing the line rate of an OC-192-rated NPU to OC-48, or placing restrictions on packet sizes and packet size mix. Compromises in functionality may involve forgoing computing-intensive scheduling and queuing algorithms in favor of simpler ones, or reducing the number of queues.
Other side effects of using an NPU to handle traffic management can also occur. For example, indeterminate response time to fabric flow control may dictate larger buffering in the fabric, increasing system cost.
The viability of dedicated, hardware-based traffic managers is all but guaranteed, because traffic management functions are already well defined and do not change with new networking protocols. As a result, the programmability of an NPU would be superfluous, making the above-mentioned compromises of an NPU-based implementation unnecessary.
Evidence that hardware-based traffic managers can achieve higher performance than software-based solutions was illustrated at Network Processors Conference (NPC West) in October 2003 in San Jose, Calif. Silicon & Software Systems, a San Jose, Calif. electronics design company, implemented a software-based solution on Intel's IXP2400 and IXP2800 NPUs. The solution required three of the eight available microengines in an IXP2400 to provide OC-12 (622 Mbits/s) traffic management in each direction (ingress and egress), and it required six of 16 microengines in an IXP2800 to provide OC-48 (2.5 Gbits/s) traffic management in each direction. Thus, approximately 75 percent of the NPU's processing power was consumed for full-duplex traffic management at one-fourth of the NPU's rated bandwidth.
Intel concurs with the empirical evidence demonstrated by Silicon & Software Systems, suggesting in the Intel Technology Journal that, for "high touch" applications, one should use two IXP2800 NPUs in cascaded fashion to achieve deterministic 10 Gbits/s simplex operation.
An NPU-based traffic-management solution can also be more expensive than a hardware-based full-duplex traffic manager, not only in initial cost but also in the less tangible costs of additional power and board real estate. While an exact analysis of each variable would depend on the application, Figure 1 illustrates the difference in the complexity of the two approaches: an NPU implementation using the Intel IXP2800 for traffic management, and an implementation using a configurable, hardware-based traffic manager, the TeraPacket TN8450.
The two approaches were tasked with the same requirements: to provide traffic management at full line-rate performance without compromises in the execution of any traffic-management features. The NPU implementation requires four IXP2800s, 12 channels of Direct Rambus DRAM, and 16 channels of Quad Data Rate SRAM. By contrast, the TeraPacket implementation requires one TN8450, 10 channels of RDRAM and one channel of QDR RAM.
Performance and cost considerations will typically dictate the selection of either a software-based NPU approach to traffic management or a hardware-based, dedicated traffic manager for your application. Keep in mind that the ideal solution is one that performs at line rate under all conditions, while executing all required functions of the device and provides a great deal of flexibility in the QoS features that the chip can implement.
As for QoS, the goal is that a router or switch should be equipped to provide highly differentiated and guaranteed levels of service to customers, enabling the service provider to offer and profit from sophisticated SLAs. And keep in mind that, for optimal QoS, the number of high-speed queues, the depth of those queues and the sophistication of the available scheduling mechanisms are all key.
Jayesh Joshi is director of engineering at Teradiant Networks Inc. (San Jose, Calif.).