SANTA CLARA, Calif. In an effort to deliver a speedy network processor with the packet-handling clout to manage the varied services carriers want to offer, Bay Microsystems Inc. will launch early next year an architecture it is calling a "CLEC-on-a-chip," referring to the competitive local-exchange carriers that are taking on the Baby Bells in many markets.
Though the chip will enter an increasingly crowded market, Bay's officials say the architecture has been drawn from scratch to overcome the inherent limitations of other network processors, particularly those based on RISC architectures.
Bay "started with a clean sheet in the design process, to guarantee the processor would be deterministic as it scales up," said Chuck Gershman, the company's vice president of marketing and sales. The company also paid attention to the extreme situations that tend to break processors, but are always among the first tests OEMs run.
With that philosophy, Bay came up with a recipe detailing a next-generation network processor that uses a very long instruction word (VLIW) architecture and works with commodity DRAMs. Deterministic performance was also a must.
Bay claims it has managed to create a chip that can handle 10-Gbit/second traffic at wire speed something no processor vendor has yet accomplished. In fact, the part is said to be capable of 20 Gbits/s, but Bay wants to ensure that it is manufacturable in high volume before proceeding with faster speed grades, Gershman said. The 10-Gbit/s version is almost ready for tapeout.
Bay's approach is the opposite of the single-instruction, multiple-data architecture of a multithreaded RISC processor. Where an SIMD architecture starts with a parallel array of pipelined processors and sends each piece of data through one of the pipelines, Bay splits up the data and sends each piece to one of several pipelines. All the engines in one stage are working on the same piece of data. As the data moves to the next stage of the pipeline, a new data stream can come in behind it.
Unlike a RISC arrangement, Bay's architecture is scalable that is, more engines can be added to attain higher performance or add more packet-altering features, Gershman said.
Service assignments
Two factors are changing the nature of silicon duties in public carrier networks, said Rick Bleszynski, chairman and chief executive officer at Bay. Core network transport already has moved to 10 Gbits/s and will quickly be moving to 40 Gbits/s. At the same time, routers and switches at the edge of the WAN are being tasked with directly assigning services to transport methods, or even to wavelengths.
This means that network processors addressing the new breed of edge equipment must speak Multi-Protocol Label Switching, Multi-Protocol Lambda Switching, Optical Domain Services Interface and any other service-assignment protocol that becomes standardized during the next few quarters.
In addition, network processors must be prepared for more than Internet Protocol traffic, Bleszynski said.
"IP may be the most important protocol to address, but we must also deal with frame relay, ATM, MPLS, Point-to-Point Protocol as well as Ethernet," he said.
Bay executives claim they cracked this nut, producing a processor capable of handling all these types of traffic. "You can build a multiprotocol router with this chip," Gershman said. Hence the CLEC-on-a-chip nickname.
While Bay is not talking in detail about its architecture yet, Bleszynski said that designers must give up their fascination with processor architectures appropriate for the control plane, such as RISC cores, but inappropriate for data path services.
Deep packet classification has been useful for designers trying to do multivariate packet analysis, he said. But what is needed most at the edge of a public network scaling to OC-192 (10 Gbits/s) and beyond is a store-and-forward packet architecture that can continue meeting wire speeds as new services are added to the core design.
Key to designing such systems is relying on a unified ultrawide-bus standard that uses standard industry DRAM, Gershman said. Specialized memory will not be acceptable in complex edge-router designs, nor will systems based on multiple internal buses, he said.
The need for determinism was another factor that eliminated RISC from consideration. RISC microprocessors run in a world where "instructions are moving much faster than the data," Gershman said, but networking is different: "Your data's moving almost as fast as instructions."
To keep pace, Bay's processor had to be engineered specifically for data-networking traffic. Management and collection algorithms are implemented in an engine that is superscalar and pipelined the model followed by most network processors but is also fully deterministic, Gershman said.
Microprocessor analyst Linley Gwennap, principal of The Linley Group (Mountain View, Calif.), wasn't convinced that determinism was that vital, however. "Where you want to be fully deterministic is in real-time applications so that when you're doing the same thing over and over again, it takes the same amount of time," Gwennap said. "In packet processing, what really matters is the average time."
The core itself will use a VLIW architecture, but with an instruction set that executes nimbly enough to avoid stalls in instruction execution.
Combining these elements, Bay plans to produce a 166-MHz processor that can handle functions from packet processing up to traffic management and queuing, all at 10 Gbits/s. Only Layer 7 classification which involves deep analysis of a packet's contents in order to make routing decisions might need to be handled off-chip.
Like most companies in the high end of public-network processor design, Bay is spending a lot of time thinking about bandwidth management, particularly across multiple ports with different quality-of-service constraints. Gershman said the company is working on several patented algorithms for bandwidth management. Statistics gathering also will be critical in the design, in order to pass information on to IP billing packages and similar software systems that make use of multidimensional counters.
The cores will be built around a 16 x 16 port-switching model. Execution engines will be implemented first in 10-Gbit/s blocks, and scaled to 20-Gbit/s systems for full-duplex OC-192, before being moved to 40-Gbit/s cores for OC-768 systems.
Additional reporting by Craig Matsumoto.