SAN MATEO, Calif. After more than two years of development, Hyperchip Inc. is ready to demonstrate the first core switch able to handle speeds exceeding 1 petabit/second, the company said.
Company officials admit it will be a few years before anyone clusters together enough Hyperchip PBR-1280 switches to reach petabit speeds, but claim they have found a way to handle traffic management for hundreds of ports, one of the most vexing problems facing developers tackling terabit switching capacity.
Hyperchip (Montreal) got its first PBR-1280 switch out the door on Tuesday (May 29) for installation on the floor at Supercomm, the trade show in Atlanta where Hyperchip will demonstrate the switch next week using OC-192 testers borrowed from Agilent Technologies Inc.
A single PBR-1280 can handle 160 Gbits/s of aggregate traffic, putting it on par with other core switches such as the Cisco Systems' GSR 12000, said Richard Norman, founder and chief executive of Hyperchip. But Hyperchip claims its switches can be stacked to achieve higher total throughput 1.28 petabits/second than any existing competitor.
The company is diving into a core switch market presently dominated by Cisco Systems Inc., with Juniper Networks Inc. positioned as the most serious competitor. And emerging terabit router vendors such as Avici Systems Inc. and Pluris Inc. have begun taking some market share of their own.
Hyperchip began life as a chip company, developing a switch fabric device now called the Hyperchip Matrix. Norman's original intent was to develop a multiprocessor for certain types of software problems, but Silicon Valley venture capitalists essentially bribed Hyperchip's early team to build a router instead. "They offered us three to five times what we were asking, and said 'This is just the beginning,' " Norman said.
The company has spent the last 26 months assembling a new team and building an entire system around the switch fabric chip, Norman said.
While traffic management wasn't among the company's original goals, Hyperchip officials had outlined the two-stage method that would eventually be used in the PBR-1280. "We sort of mentally thought about it, but we'd never written it down," Norman said. "We thought everybody would try to do it this way."
The star of the PBR-1280 is the Hyperchip Matrix switch fabric and the traffic management apparatus that resides with it. Traffic management the process of balancing traffic flows going into the switch fabric to prevent bottlenecks has proven one of the most difficult aspects of high-speed networking. Core switches can handle traffic management for a 16-port fabric, but switch vendors are trying to find ways to tackle hundreds of ports, which increases the complexity of the problem a thousandfold, Norman said.
Divide and conquer
Hyperchip gets around the problem by dividing traffic management into two stages. A global stage assigns traffic to one of 10 Matrix cards, each boasting the enormous 16-port Matrix switch fabric. Local traffic management on the fabric itself then assigns packets to individual ports.
Each stage consists of traffic management at a level no more difficult than existing switches. "We reduced the problem to the same scale of problem we [can solve] today," Norman said.
To implement this scheme, Hyperchip had to pack the Matrix switch fabric with extras, including 100 small blocks of SRAM and 300 arbiters, or "mini-schedulers," Norman said. The SRAM blocks in aggregate could provide 1 Tbit/s of both read and write bandwidth to the chip's 16-point crossbar, providing plenty of overhead for spikes in traffic, Norman said.
The on-chip SRAM provides high-speed access, but traffic management also requires the use of extremely large blocks of memory for storing tables. This block, also consisting of commodity memory, is kept off-chip but on the same card as the switching matrix.
Matrix also includes 16 "little processors" that monitor the data path, and are intended to support future applications carriers might create, Norman said.
All told, Matrix is a 4.5-million transistor chip, the most complicated that IBM Corp. has ever fabbed for an outside company, Norman said. But the chip uses a lot of repeated elements, and includes only 100,000 unique gates, he said. "To all the tools that could handle two levels of hierarchy, it was a 100,000-gate chip," Norman said.