There has been considerable recent research into fast database search methods for wire-speed packet forwarding and classification in next-generation routers. The intelligence implemented into edge routers requires additional processing and deeper examination of each incoming packet, such as forwarding lookups, packet classification, policy and admission control, and flow cache.
To address these growing requirements, network search engines (NSEs) utilize innovative features to deliver additional search performance at higher line rates. One such feature is the simultaneous multidatabase lookup (SMDL), which enables the packet processor to perform up to two database lookups with one instruction. This article explores the traditional classification approaches and shows how SMDL solves many of the bus bandwidth problems encountered by these approaches.
Traditionally, algorithmic classification has been implemented in external memory such as SRAM or reduced-latency DRAM (RLDRAM). The most popular legacy solution for forwarding and Layer 3+ classification is a multibit Trie. It examines several bits of the search key at each level, with the results of all nodes and entries residing in external SRAM (operating at up to 333 megahertz with a 6-nanosecond latency) or RLDRAM (20-ns latency). As a result, any algorithm that needs iterative random accesses suffers from the latency of RLDRAM that normally results in a total latency of 100 ns when a 16-4-4-4-4 Trie lookup is assumed.
However, there are techniques that allow RLDRAM and SRAM to be used interchangeably. For example, SRAM-like speeds can be achieved by using four or eight banks of RLDRAM at the expense of four to eight times the memory requirements of duplicated memory.
In the following example, we will evaluate a multidatabase lookup scenario for a multibit Trie SRAM/RLDRAM architecture. In this example, we will consider the following classification rules (see first table below).
The second table (below), on the other hand, summarizes the bandwidth requirements on external memory architecture.
Table 2
The performance assessments shown in Tables 1 and 2 use both 250 MHz and 333 MHz memory bus bandwidths. The green area indicates performance that is adequate for the given line rate with a 250-MHz bus, while the yellow area shows adequate performance at 333 MHz. The red areas indicate configurations that will not be able to keep pace with the given line rate. Along with the performance analysis, the number of memory cells needed for implementation and the number of 36-Mbit devices needed to provide this amount of memory are also depicted.
The analysis shows that SRAM-based designs are incapable of handling IPv6 forwarding and access control at line rates above 10 Gbits/second. Bus loading may also pose a problem for even slower speeds. Connecting eight memory devices in parallel on the network processor's bus is an option, though it may slow the bus well below the anticipated 333-MHz data rate.
However, RLDRAM solutions potentially avoid a bus-loading problem as each device has a dedicated memory bus. In this solution, one needs only four 256-Mbit RLDRAM devices, each on separate buses. Unfortunately, this will either block all of the network processor's memory ports or require quadruple the number of pins on the ASIC/FPGA dedicated to the RLDRAM search.
Turning to SMDL
Content-addressable-memory-based NSEs can cycle at 250 MHz for match rates of up to 125 million searches a second without the need to duplicate data. As NSE lookups are pipelined, a result is produced every cycle (8 ns) with a total latency of up to 40 ns.
As lab tests have proved, the NSE can handle 10-Gbit/s line rates, but may experience difficulties in maintaining the search performance when the number of lookups increases or the line goes to 40 Gbits/s. Leading NSE providers have addressed this limitation by incorporating the SMDL specialized feature that enables the NSE to deliver additional search performance at these higher line rates.
An efficient classification solution for 10-Gbit/s line rates includes an ASIC and an NSE. In this type of solution, additional lookups can be achieved by using the SMDL function, which enables the ASIC to perform up to two database lookups with one instruction.
The ASIC sends the widest key with the lookup instruction; if this key includes smaller keys that need to be looked up in a different database, the NSE can automatically perform the lookups and return all the results. As an example, if the advanced CMOS logic key (ACL) includes a 128-bit IPv6 address, the ASIC can send a 252-bit ACL key and issue SMDL. This way, the ASIC does not have to send the IPv6 lookup separately and the quad-data-rate or generic NSE bus does not need four separate accesses.
The same process can be implemented for flow caching and IPv4 addresses. A similar solution can also be implemented using an NPU and an NSE with a fully integrated interface, though the ASIC/NSE solution is more efficient and capable of lookups that are twice as fast.
There are many methods varying from algorithms to NSEs that can handle the growing search requirements on a line card. Forwarding algorithms, usually in the form of a Trie-search using external memory, run on multiple buses. NSEs usually run on separate buses and handle the rest of the classification searches. Depending on the speed and number of the lookups, the number of pins dedicated to classification can grow large. This may be a problem for today's ASIC-based solutions where pins significantly add to the total system cost.
SMDL alleviates this problem by requiring a single transmission of data over the memory bus. The NSE is then capable of performing multiple searches on the information and returning consecutive results to the ASIC, thus maximizing the memory bus throughput and decreasing the cost of the ASIC. This solution meets the requirements for search performance in 10-Gbit line cards, where SRAM and RLDRAM solutions are inadequate.
Michael Miller is the chief technology officer and Bertan Tezcan is a systems engineer at IDT Inc. (Santa Clara, Calif.).