The exponential growth rate of Internet traffic has caused a need for increasingly faster processing of encrypted data by network security processors which offload security functions from general purpose network processors. Today, even highly vaunted network processors are struggling to cope with IP security (IPSec) traffic that makes higher computing demands due to the proliferation of virtual private networks (VPNs) and e-commerce applications.
Until recently, security processors were algorithm accelerators, designed to accelerate just the algorithms used by IPSec to encrypt and decrypt data. This approach substantially increased data throughput, but it is coming up against a severe bottleneck caused by the fact that each data packet has to be handled by either the CPU, or the network processor, to update packet information contained in the header or trailer.
At lower speeds, say T3 or OC-3, existing configurations are able to keep pace. However, speed requirements are climbing rapidly into the multi-gigabit range and new solutions are needed.
The approach that best suits current demands is one where the specialized security processor off-loads as much of the workload as possible from the CPU and network processor. As a first step in increasing overall system performance, full packet processing must be incorporated into the security device. By doing so, higher performance is achieved by reducing the amount of interactions over external buses and between critical system resources like the CPU and memory.
Security packet processors tend to alleviate host CPU bus traffic as the packet only needs to traverse the host bus one time in each direction, once as an IP packet and once as an IPSec packet. With algorithm accelerators, there is extra bus overhead for headers in both input and output directions, for both encode and decode. Additional host system bus overhead exists in transferring the context associated with the IPSec function.
If there is more traffic on the host bus, then not only will security functions be bottlenecked, but other functions cannot use the bus. Any time the host bus doesn't have the bandwidth to handle the extra traffic created by an algorithm accelerator implementation, a system stall will occur.
A stall occurs when the host system does not keep up with the security processor. This is a big problem with algorithm accelerators. The problem is almost entirely eliminated with security processors that incorporate full packet processing, specifically the header and trailer manipulations involved in the processing of protocols.
Network system efficiency is directly related to the amount of latency added to a packet from the time it enters the system until it exits. Total system latency is derived from the amount of time each component in the so-called fast path adds to the overall time required for a packet to traverse the system. Here, "fast path" refers to the data path requiring a minimum of processing. Therefore, it is important that the system engineer be equipped with silicon that strikes the right balance between maintaining a general-purpose architecture for broad use and minimizing the amount of interactions required to complete the securing function.
Today, for half-duplex OC-48 data rates, packet processing requires 8,000 MIPS in addition to encryption processing. This demands anywhere from two to five network processors if an algorithm accelerator is used. The designer will likely encounter a bottleneck at each of the encoding task steps simply because these operations are being handled by a general purpose CPU and software. Hence, the designer will spend inordinate time tweaking system parameters to minimize system stalls.
See related chart
Considering the design shortcomings algorithm accelerators pose, next generation network security subsystems call for all packet overhead to be transferred to advanced security processors touting highly dedicated packet processing. Examples include the 7851, 7854, and 8154 in Hifn's intelligent packet processing (HIPP I and II) families. Security processors like these perform header analysis, payload extraction, compression, encryption, authentication, and packet assembly. Each device stores the necessary context data for each security association in its own private memory, which eliminates the time-consuming need for data to go off chip and on to a system bus.