Intel's next generation of network processors will introduce more complex microengines and a bus network that links them in a ring to pipeline data between them.
The company will use the pipelining system to let devices running at up to 1.4GHz process packets at OC-192 and 10Gbit/s Ethernet rates. The follow-on to the existing IXP1200 family will replace the StrongARM host processor with one based on the Xscale architecture.
The microengines that run the packet processing code will keep their basic design but the company plans to take advantage of the extra die space that will become available with the move to the company's 0.13µm.
In the IXP1200, the microengines can run four software threads in parallel. When each one makes a memory access that is going to take more than one cycle, the processor switches to the next thread that is ready to run. This technique, which is being introduced to other companies' network and signal processors, as well as Intel's own server-level Pentium 4 processors, helps hide offchip memory latency.
Frank Casey, manager of Intel's network processor group in Europe, said: "They will be able to run more threads and there will be more of them. We are applying the lessons we have learned with the IXP1200. There will be more instructions to perform things such as pseudo-random number generation and time-stamp functions. The instruction set has been honed quite a bit."
Currently the IXP1200 processors run at up to 232MHz. Casey said the move to 0.13µm would make it possible to clock the new devices at more than 1GHz. "The IXP1200 was originally from the Digital [Equipment Corporation] area. They were using a process that was not a core Intel process. This one will be. It will be implemented in 0.13µm although we might actually use 0.10µm if that becomes available."
He added that the first parts are expected to appear next year and are likely to be based on the CSIX standard being worked on by the Network Processor Forum.
"The interface we will be using will be standard," said Casey.
As well as building in more microengines, Intel has changed the memory architecture to deal with the increasing latency of off-chip memory. The company calls its approach "distributed caching". It means that, as well as its own register file, each microengine has 640-word of local memory to store packets or packet fragments on which it can work. Each microengine can pass data to its nearest neighbour using a 128-entry register file. Typically, a microengine will perform a set task such as an address lookup or to classify a packet, then pass the data or a reference to the packet sitting in main memory through the registers to the next engine in the chain. That microengine will then perform another task. Casey claimed the new instruction would help keep code size down but, to let more threads run in parallel that can perform different tasks, Intel has doubled the code memory for each microengine to 4Kword.
Casey said the advantage of using comparatively generic network processing engines is that it will let network OEMs build in support for new functions as the need appears.
"Over the two to three years it takes to develop a new communications systems, the feature set changes before it goes into production. Virus and worm detection has become the latest hotspot. So field upgradeability becomes a huge selling point," said Casey.
Each microengine will get its own small content addressable memory (CAM) to handle address lookups. CAM entries will be replaced based on which was used the least recently used if a new address match needs to be written in, letting the microengine CAM act as an address cache.