The offload of tasks for data plane packet-processing elements, such as network processors, to co-processor solutions, such as classification engines, ternary CAMs (TCAMs), IP co-processors, and more can become a key choke point in today's packet-processing architecture. To help curb this bottleneck, the Network Processing Forum (www.npforum.org) has released the first rev of a standard lookaside interface, dubbed LA-1, that accelerates and standardizes the movement of packets between data plane and co-processor solutions.
The LA-1 interface supports the transaction requirements for OC-48 through OC-192 line rates. The performance specification for co-processors is four or more transactions at OC-48 or one or more transactions at OC-192 line rates.
The LA-1 interface as currently defined has a bandwidth of about 6.4 Gbit/s per direction, which is sufficient for lookups at 10-Gbit/s (OC-192) packet rates. It's important to note that packet count assumptions used to develop the LA-1 interface are based on line rate performance using 40-byte packets and 144-bit search keys.
The new interface spec is based on separate dual data rate (DDR) buses for data inputs and data outputs. Using DDR interfaces, data is clocked on the rising and falling edges of the clock signals. This effectively doubles the bandwidth of the interface without increasing the clock speed or the bus width. Overall, the LA-1 interface operates at clock speeds between 133 and 200MHz.
In addition to supporting DDR operation, the LA-1 interface provides a host of features. These include:
- Unidirectional read and write interfaces
- A 24-pin address bus
- 18-pin DDR DataOut transfers 32 bits data + 4 bits of even byte parity per read
- 18-pin DDR DataIn transfers 32 bits data + 4 bits of even byte parity per write
- Read and write enable signals (W#, R#)
- Data clocks provided by the host network processing element (C, C#, K, K#)
- Echo clocks generated by the co-processor (CQ, CQ#)
- Port Enable Signals (E[1:0] and EP[1:0])
- 1.5-V HSTL I/O pins as defined by JEDEC (EIA/JESD8-6)
The LA-1 data inputs and outputs operate simultaneously, thus eliminating the need for high-speed bus turnarounds (i.e. no dead cycles are present) [Figure 1]. Access to each port is accomplished using a common address bus. Addresses for reads and writes are latched on rising edges of K and K# input clocks, respectively. Each address location is associated with two 16-bit data words that burst sequentially into or out of the device. Since data can be transferred into and out of the device on every rising edge of K, K# and CQ, CQ# (or C, C#) clocks respectively, memory bandwidth is maximized while simplifying overall design through the elimination of bus turnarounds.
Figure 1: Data inputs and outputs operate simultaneously on the LA-1 bus.
Figure 2 provides a functional timing diagram for the LA-1 interface. Asserting the write-select (W#) input low at the K rising edge initiates a write cycle. The following K# rising edge provides the address for the write cycle. Each write address provides the base address for two 18-bit transfers. Hence, 32-bit data plus four even-byte parity bits are transferred for each selected address.
Figure 2: Functional timing diagram for the LA-1 interface.
Asserting the read-select (R#) input low at the K rising edge initiates a read cycle, and the address bus presents the read address. Data is delivered after the next rising edge of clock K using the clocks C, C# (or CQ, CQ#) as the output timing references as shown in Figure 2.
The LA-1 interface uses a memory-mapped structure. Network processing components can use the address bus to control co-processor functions. Reads and writes to memory-mapped registers are used to initiate co-processor actions, retrieve results, and provide in-band management.
One of the big advantages of using the LA-1 interface is the capability to support devices in a multi-drop configuration (i.e. multiple devices connected to the same interface). Through this capability, system designers can support both a co-processor (TCAM) and an SRAM on the same bus at the same time.
TCAM searches require more write bandwidth whereas SRAMs are generally read intensive, so having two separate data buses allows the network processing element to write to a TCAM and read from a SRAM simultaneously without wasting any processor bandwidth.
Two multi-drop cascading configurations are supported. The first configuration uses a single read/write enable pair, and two bits of the address to select the device. All devices can be either co-processors or a mix of co-processors and SRAMs that support programmable port enables. EP[1:0] pins are used to determine the polarity of the E[1:0] pins. Figure 3 below shows four devices cascaded using this method.
Figure 3: First Multi-drop configuration using LA-1.
The second configuration of Multi-Drop cascading uses separate and independent read and write enable pairs to select a device. The E[1:0] and EP[1:0] pins are not used, and should be tied low as shown below in Figure 4.
Figure 4: Second Multi-drop configuration using LA-1.
One of the limitations of the existing LA-1 interface is that commercially available SRAMs have interface speeds faster than the LA-1 maximum of 200 MHz. To solve this problem, some designers are looking at ways to extend the speed of the current LA-1 interface in order to support higher-speed co-processor solutions compatible with existing SRAM speeds. Answering this call, the NPF has instituted a project to increase LA-1 bandwidth. A faster LA-1 interface specification should be standardized by summer next year.
Working on LA-2
The development of the LA-2 interface is in progress. The LA-2 is a request-response type of interface and supports the transaction requirements for OC-192 through OC-768 line rates. The current physical layer of LA-2 is based on the emerging SXI-5 specification.
The SXI-5 implementation agreement as defined by the Optical Internetworking Forum (OIF) specifies the electrical characteristics for system packet interface 5 (SPI-5) and serdes framer interface 5 (SFI-5). The NPF is also considering SXI-5 for the next-generation streaming interface for network processing applications. If this happens, chip developers could invest in a single I/O for both the look-aside and streaming interfaces. This, in turn, would simplify the system design process for equipment developers.
Author's Note: For a more information on the LA-1 interface, visit http://www.npforum.org/ApprovedSpecs.htm.
About the Author
Harmeet Bhugra is a systems engineer at Integrated Device Technology Inc in Santa Clara, CA. He is actively involved in the LookAside task group at the Network Processor Forum (NPF). Prior to joining IDT, Harmeet held ASIC Design positions at Nortel Microelectronics in Ottawa, ON and PMC-Sierra Inc. in Burnaby, B.C. He can be reached at firstname.lastname@example.org.