Cisco Systems last week showed a router that packs a variety of networking services on a custom, 40-core processor. The Ethernet giant aims to leverage its expertise designing complex ASICs to leapfrog competition in the $5 billion market for edge routers.
The Quantum Flow Processor takes to a new level Cisco's work on ASICs for its networking systems, surpassing in some ways technology in mainstream server CPUs from Intel Corp. and Sun Microsystems. Analysts said the move is a savvy one for Cisco, although some complained the company is keeping details of its new chip sketchy.
Cisco claims it spent $250 million and five years developing its Aggregation Services Router 1000, $100 million of that just on the flow processor. The router handles functions such as firewall, IPSec virtual private networking, deep-packet inspection and session border control at rates up to 20 Gbits/second.
"There are half a dozen or so appliances all being used to provide these functions at the edge of carrier and end-user networks," said Pankaj Patel, general manager of Cisco's service provider group. "Our value proposition is to put them in one small form factor box to reduce capital and operating expenses."
Competitors such as Juniper Networks and Redback Networks--and Cisco's existing 7600 series routers--typically slot multiple cards in a chassis or stack appliances in a rack to handle all the features increasingly being processed on the network edge, said Eve Griliches, a telecom analyst at International Data Corp.
"The more integrated you get it, the better performance you get when you are trying to run all the services at once--and eventually users will want to run all these services at once," Griliches said.
Key to the system is the 1.3-billion-transistor flow processor, an 80-watt chip made in a 90-nanometer process at Texas Instruments and designed using Cisco's customer-owned tooling. Each of its 40 Tensilica cores can handle up to four threads, far beyond the raw thread-level parallelism of Sun's 65-nm Niagara or Intel's 45-nm Penryn server CPUs.
Click here for larger image
"We looked outside and internally to see if there was anything we could use, but nothing came close," said Nikhil Jayaram, director of engineering in Cisco's midrange routing group. "Other architectures were about packet processing, but we wanted to do flow processing of stateful traffic."
"Multicore processors and complex aggregation routers are converging in a way that means the most-complex communication processing chips now dwell at the edge of the public network," said Loring Wirbel, director of the EE Times Market Intelligence Unit. "The center of the network now means big, dumb, high-speed bit pushing, while all the smarts reside at the edge of the public network, and core routers like Cisco's CRS-1 are no longer the premier platforms for high-performance network processors."
The company hopes the processor will be used in a wide range of routers and be actively upgraded for years in the field. But success in the dynamic edge-networking market, which is growing at double-digit rates, is not assured, said Griliches of IDC.
"The market is littered with router makers who have tried to deliver all the services in one box, but have not done so sufficiently well, because it is not easy to do. Putting everything in one chip is a step in the right direction. A lot of their competitors will be moving in this direction," she said.
The company's track record in ASICs has been measured and successful, said Bryan Lewis, a vice president of research at Gartner.
"Cisco is doing fewer ASIC designs than they have done in the past, but the revenue per ASIC design is growing, causing them to be one of the top buyers in total ASICs and the top buyer in wired comms," he said. "Each design they do internally has been very successful in generating significant revenue for them."
The flow processor appears to have an edge on merchant network processors, but it's difficult to tell because Cisco is guarded about releasing substantive details on the proprietary part.
"Most NPUs [network processing units] are still working largely at Layer 2 and 3, mainly forwarding packets and not doing a lot of upper-level processing," said Bob Wheeler, analyst with The Linley Group (Mountain View, Calif.).
Intel and Cavium Networks have designed 10G network processors that approach what Cisco is delivering. The Intel IXP 2800 uses 16 programmable cores to run services on cards. Startup Netronome is developing a 20G version.
Cavium's Octeon uses 16 MIPS cores that can handle some Layer 4-7 service jobs. It sports an embedded pattern-matching engine but requires off-chip TCAMs for packet classification. For Cisco, "the challenge was turning a multiprocessor into a network processor," said Jayaram, a former chip designer at Digital Equipment Corp.
As many as 100 engineers took part in the project, including former microprocessor designers from AMD, Cyrix, Intel and Sun, as well as the team that designed the multicore ASIC for Cisco's CRS-1 core router.
The group pushed detailed chip design to a new level even for Cisco, one of the top captive ASIC design companies in the world. They worked on circuit designs and memory designs, did their own chip layout and RTL--even designing their own package, another Cisco first.
"One of our biggest challenges was signal integrity, and the package plays a very big role in that," said Jayaram. "A poorly designed package can really bite you in power and signal integrity, but our substrate is almost invisible from an SI perspective."
Keeping the 1.2-GHz processor fed was another issue. Cisco opted for a flat memory model using multiple channels of second-generation reduced-latency DRAMs and various memory blocks inside the chip.
"I suspect we use more on- and off-chip memory than anyone else," said Jayaram.
The flat model for system DRAM helps keep programming the device in C code simple, compared with some network processors that use fragmented banks of TCAMs and other memory structures.
By using up to four threads per core, the chip can hide some of the latency that comms processors generate with their requirements for many memory accesses. Most computer processors are using only two threads per core.
The choice of Tensilica over MIPS or ARM as a core supplier was a close call. "They were fairly similar, but the Tensilica architecture had some benefits when you dip down into the gory details of network processing," said Jayaram.
The cores are linked on what is "effectively a high-performance crossbar switch," he said. Processors using more than 40 cores typically move to more-complex structures, such as a mesh.
Externally, the chip sports four 10-Gbit SPI 4.2 ports to ship traffic in and out at rates up to 20 Gbits/s, thanks to a Cisco proprietary feature for linking two interconnects. A next-generation version of the chip will use a derivative of the Interlaken interconnect to deliver traffic at rates of up to 40 Gbits/s in and out of the chip.
"We did a lot of work to future-proof this design" with all the blocks ready for 40G flows, said Jayaram.
The chip is geared for key comms tasks such as tree lookups, hashing functions and high-bandwidth/low-latency access to DRAM. Much of its secret sauce takes the form of complex algorithms for flexibly handling a variety of content flows; some are passed through directly, while others get detailed processing.
Other ASICs on the board include some packet framers and other generally minor parts. Cisco added a virtualization layer to its IOS router software in order to deliver fault-tolerant redundancy on the system without requiring multiple flow processors.
Cisco has filed for 42 patents on the new router, most of them on the processor.
The rapid rise of network traffic will propel the need for the new system, the company said. Cisco estimates global IP demand will grow from 7 exabytes per month in 2007 to 29 exabytes per month in 2011, fueled in part by consumer video. The 2011 figure is more than 1,100 times the amount of traffic that traversed the Internet backbone in the United States in 2000, Cisco estimates.
The company has mustered support for the new router from multiple end users or potential users, including Lufthansa Airlines and financial firm Wachovia. A Cisco press release quotes one telecom executive saying the router represents a class of design needed for future carrier networks.
"We believe it will be necessary for the edge of network to perform dynamic quality control to flexibly and securely enable aggregation of traffic from broadband services and converged communications," said Shin Hashomoto, an executive vice president with Nippon Telegraph and Telephone, in a prepared statement.
The Cisco ASR 1000 will be generally available in April priced from $35,000.