Design Article

IMG1

Multi-Core Microprocessor Architecture for Network Services and Applications

Amer Haider, Cavium Networks

1/31/2005 1:00 AM EST

Adding network services in a secure fashion to today's network infrastructure requires deploying a large number of separate devices. These can include a Layer 4+ switch, an anti-spam gateway, firewall, a virtual private network (VPN), a secure sockets layer (SSL) VPN, an application firewall, an intrusion-detection gateway, an intrusion-protection gateway, an anti-virus scanner, a wireless switch, a storage switch, and the list continues. Clearly the presence of these multiple devices poses a significant management, interoperability and deployment challenge for network operators.

Going forward, networking OEMs are collapsing the functionality of multiple network-service devices into a single system to solve network administrator management and total-cost-of-ownership problems, and gain competitive advantage.

The convergence of multiple functions, and diverse services from different systems into a single system, gives rise to many challenges for networking OEMs. First, network traffic needs to be handled at up to multi-gigabit rates. Second, deep packet inspection for all packets is required. Third, wire-speed security needs to be applied at each network layer, from Layer 3 to Layer 7. Fourth, services need to be performed on the processed network packets.

Many companies currently approach the above network-service processing challenge by dividing the tasks into different performance segments. For systems with performance averaging less than 50 Mbits/s, popular architectures are based on an embedded communication processor with 1-to-2 coprocessors for content and security processing. For systems with performance in the hundreds of megabits but less than 1 Gbits/s, using an x86-based system with a network interface card and two to three coprocessors is the popular architecture. For systems with performance exceeding 1 gigabit per second, the design today requires an elaborate system with a backplane, line cards, management blades, service blades using ASICs, and multiple general-purpose processors. Clearly these multiple architectures require duplicated, specialized hardware and software engineering teams and a completely different software architectures for each corresponding hardware platform. Moreover, for the higher end performance systems, complex, expensive and proprietary microcode development is necessary.

Multiple hardware designs for different network-service performance ranges described above stem from today's processor landscape. On the one hand, network services require the ability to leverage existing code, programming models and popular development tool-chains and environments found on general-purpose architectures such as the x86, MIPS and PowerPC. On the other hand, network services require wire-speed network packet processing, scheduling and buffering necessitating specialized hardware. Clearly there is a gap today between the network-service processor requirements and the existing processor landscape.

To cover this gap, processor companies have added networking interfaces to general-purpose processors with integrated memory subsystems. These are called communication processors. This bolting together of functions only partially alleviates some interconnect bottlenecks at best, and hence communication processors cannot provide the network traffic performance in demand today. For increasing network-traffic performance, processor vendors have integrated multiple proprietary cores with integrated memory subsystems, called network processors. However, porting general-purpose processor software required for network services to a proprietary core with a limited amount of code space and proprietary development tools has relegated network processors to Layer 2 and Layer 3 types of network traffic processing. Adding further disarray to the above gap between communication processors and network processors is the fact that both processors fail to address the need for processing computationally intensive operations and network-services applications such as encryption, transmission control protocol (TCP) offload, application compression and decompression, look-ups, and regular-expression acceleration. This necessitates attaching separate co-processors connected via additional system interfaces.

To enable widespread deployment of network services across the enterprise, small office, home office (SOHO) and small-to-medium enterprise (SME) and service-provider networks in a secure, cost-effective, and easily manageable way, there is a clear need for a new processor architecture. This architecture must address the two distinct requirements of general-purpose programming and high network performance, along with incorporating specific hardware for computationally intensive network services, security and applications. This need is addressed by a new class of processor, the network services processor (NSP).


Figure 1: Diagram of a typical multi-core processor design.

The Octeon NSP from is a single-chip solution for network-service systems ranging in performance from 500 Mbits/s to 10 Gbits/s. It has anywhere from 2-to-16 MIPS64 based cores on a single chip with a variety of networking and memory interfaces and co-processors to accelerate FW, VPN, IDS, IPS, Anti-Virus to address a wide range of price performance points.

NSPs are unique because they have a fundamentally different design philosophy when compared to the existing classes of processors such as general-purpose processors, communication processors, control-plane processors and network processors.

To address the massive computation power required for providing network services that involve deep packet inspection for all network traffic, NSPs use multiple cores with separate packet-scheduling hardware. This multi-core architecture capitalizes on the inherent parallelism of packet-based network-service processing.

To maintain common software architecture across multiple performance ranges, all cores in an NSP use a standard instruction set architecture (ISA). Using a standard ISA provides software architects and developers the ability to leverage an existing code-base, run an operating system and use popular development environments with standard programming models. There is no need to learn a proprietary or new development toolset or language.

To enable scalable multi-core programming, NSPs have an on-chip packet workload balancer in hardware that schedules and orders packets or flows. This packet-ordering hardware significantly simplifies software programming and allows multiple cores to coordinate work between them for various packets and flows, without stepping on each other's toes. The previous generation of multicore processors lack this, and consequently, they require expensive software locking functions that limit the processors' scalability beyond two processor cores.

To increase the efficiency of packet-manipulation and network packet processing, NSP cores use special core instructions that are automatically generated by the compiler to increase performance.

For security encryption and hashing such as 3DES, AES, RSA, SHA-1 etc., NSPs add hardware acceleration to the core for accelerating encryption and hashing, instead of using separate on-chip or off-chip co-processors. Further, this integration of encryption and hashing instructions on-core eliminates system interconnect bottlenecks caused by off-core traffic for each packet.

To offload repetitive network packet operations at Layer 2 to Layer 4, NSPs use intelligent networking interfaces to perform Layer 2 to Layer 4 parsing, check-sums, error checks and buffer management and tagging in hardware. This allows the processor cores to focus on C-based application functionality.

For TCP processing, NSPs have built-in TCP co-processors added across multiple areas in the chip. TCP termination is an important aspect of application-aware services as well as emerging storage networking applications. NSPs are capable of performing complex multi-application sets at up to multi-gigabit rates.

For processing computationally intensive deep packet inspection for IDS, IPS (intrusion detection and prevention), and anti-virus, NSPs use hardware-accelerated string matching, regular expression processing, application decompression and compression that significantly offloads computational load from the general-purpose purpose cores.

In summary, NSPs enable networking equipment companies to build the next-generation of network-service appliances and systems using a single software architecture and single hardware architecture by just scaling the number of cores and performance of specific on-chip co-processors. This new paradigm promises to change the economics of network-service equipment by reducing end cost to users and increasing available features across multiple performance ranges. It also dramatically reduces the software and hardware design investment required by equipment vendors to develop a scalable product line.

About the Author
Amer Haider is the director of strategi marketing at Cavium Networks. Amer can be reached at amer.haider@caviumnetworks.com.


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Product Parts Search

Enter part number or keyword
PartsSearch

FeedbackForm