PLX Technology has developed a way of turning the PCI bus into a fault-tolerant switched fabric.
The core of the PLX system is an interface chip with a version 2.2-compliant PCI on one side and an interface to a ring-based network, similar to those used in Sonet and FDDI networks, on the other.
Although PLX had been working towards developing a switched-fabric underpinning for its PCI interface chips since 1998, the company decided to buy the technology in for its adaptive switch fabric. It bought specialist communications silicon company Sebring Networks in the spring.
David Ridgeway, director of marketing for switch fabrics at PLX, said: "At Sebring, before the acquis-ition by PLX, we had developed the first generation of our switch-fabric technology."
The PLX silicon is designed so that each node on the ring can be the end of a complete PCI bus. The addressing mechanism used by the ring lets it support a total of 224 ports, or PCI buses.
In contrast to FDDI, the ring uses an insertion mechanism rather than token passing. This means that multiple segments around the ring can pass data simultaneously.
Nodes on the ring pass packets in two directions around the ring. The packets are generally either 64 or 128byte in length.
The ring-interface chip chops PCI burst transfers up to fit into those packets, much like the segmentation scheme in asynchronous transfer mode (ATM).
Like ATM, the packets have header error correction and a cyclic redundancy check for the payload. But to deal with the large number of short transactions that can appear on PCI, the packets can be shorter than their nominal length. A flag signal is used to mark the boundary of each packet.
The fabric is largely aimed at communications-oriented designs, such as the control plane in a high-end switch or router. As a result, the company has designed the PCI interface to act in the same way as the bridges used in CompactPCI backplanes.
"The interface has a transparent and a non-transparent mode," said Ridgeway.
Unlike the PCI bridges on the market, the PLX devices will switch between transparent and non-transparent modes to allow fail- over between system controllers. That is not currently possible with existing PCI silicon. For non-transparent operation, the ring can be split into processing domains so that the PCI buses within those domains cannot be configured by a system controller in another domain.
Ridgeway said: "There can be seven separate domains, with 32 nodes in each domain. We de-signed this for n+1 redundancy so we can have truly redundant systems with PCI."
Each interface has a content-addressable memory (CAM). It stores an address map for the local PCI bus attached to it. When a packet is inserted into the ring and relayed, each node compares the address with the stored map. It works out whether the packet is meant for it or another node. If the latter, the packet is passed on.
"A very small percentage of the ring needs to be configured on a failover," said Ridgeway. "If you have node six acting as a standby and node zero fails, all that six has to do is put the node-zero data into its CAM.
"It is not milliseconds but tens of microseconds for a fail-over. We don't provide the software to do that but we do provide the hardware to do it easily."
The ring is wired so that each node can detect a cable break easily. Pull-up or pull-down resistors at the end of each link are used to help monitor for a break. When that happens, it triggers an interrupt so that nodes can be updated on which way they should send data.
"Under normal circumstances, packets only have to travel halfway round the ring because each node knows the fastest way to the destination."
The break interrupts are used to manage the insertion of new nodes into the ring. The ring will have to be broken to add those nodes and the subsequent healing helps trigger a new configuration cycle.
For the physical interface, the company has used a similar approach to Infiniband. The connection can be through a cable or backplane with LVDS signalling.
The first implementation will use a 16bit-wide interface clocked at 400MHz. For well-balanced traffic in which most of it travels only a quarter of the way around the ring, Ridgeway says the aggregate ring bandwidth could reach 50Gbit/s. The company does not intend to compete with Infiniband, he said.
"This is a good transition to Infiniband. We can provide access from legacy PCI devices to Infiniband.
"The bus interface does not have to be PCI. It could be a processor bus. The art we've mastered is one of translation between the fabric and the bus. We have headers for the types of cell we carry. PCI: this is just one type of cell."
The company is keeping an eye on the movement in communications-oriented fabrics, such as the Common Switch Interface (CSIX).
"We made sure we could migrate this [to CSIX]," said Ridgeway. "The fabric is designed for control-plane switching in axis, core and edge routers. But it could be used for data-plane switching in edge routers."
PLX aims to sample the first interface parts in Q1 next year with Q2 scheduled for production.