A new concept for crossbar switching fabrics that targets unused switching paths will be presented at the Custom Integrated Circuits Conference this week. The following article contains excerpts from the paper, titled "A Crossbar Switching Fabric with Improved Performance and Utilization," with reprint permission from IEEE 2002 CICC.
The core of a network switch or router is the switching fabric, for which many alternatives have been proposed-from simple shared buses to complex multistage networks. Of the various alternatives, the crossbar is a popular choice due to its desirable cost, scalability and nonblocking properties.
A simple yet effective hardware modification can now significantly increase the performance and utilization of a generic crossbar. The proposed structure, called Flexbar, is based on the addition of lightweight, configurable, input and output hardware layers that exploit unused switching paths to provide additional data transfer capability for highly loaded paths.
The architecture has been implemented and evaluated as a network switch fabric. Extensive system simulations under various traffic scenarios indicate that latency is cut by up to 70 percent, and peak throughput of highly loaded ports can increase by more than 100 percent. A full-custom design in micron technology requires marginal area and performance overheads (4.47 and 8.23 percent, respectively, for a typical configuration) compared with a conventional crossbar.
In theory (with infinite queues), using state-of-the-art scheduling algorithms and under simple, well-suited traffic scenarios, it is possible to achieve 100 percent utilization of a crossbar's capacity and hence, maximum throughput. In practice, however, utilization tends to be quite low due to bursty and asymmetric traffic. Also, high utilization cannot be achieved without significant packet loss for finite input-queue sizes.
While we have implemented Flexbar and demonstrated its advantages in the context of a state-of-the-art network switch, we believe that it can benefit a wide range of high-performance systems. The proposed enhancement is independent of advances in circuit design and traffic-scheduling technologies.
A traditional 4 x4 crossbar has 16 virtual output queues holding packets (top left). Scheduling this traffic results in sources 1 and 3 getting routed to destinations
4 and 1 respectively in the cycle. Two out of a maximum of four possible data transfers are used for 50 percent utilization, (top right). With added hardware layers at the input
and outputs in the Flexbar (bottom left), the scheduler can use the additional routing options to add packets from sources 2 and 3 to destinations 3 and 2 , respectively,
in the first cycle. This results in 100 percent utilization (bottom right) of the underlying switching fabric's capacity.
Sources: Stanford University, NEC USA
In the basic architecture of an N x N crossbar enhanced using this proposed technique, each traffic source (destination) is connected to more than one input or output port of the crossbar through lightweight, configurable hardware layers. The traditional crossbar connectivity can be defined as a one-to-one mapping.
The parameter "k" determines the flexibility of Flexbar (k = 1 defaults to a conventional crossbar). In general, each traffic source can be connected to any arbitrary subset of crossbar input (output) ports. To minimize hardware overheads, however, it is preferable to connect each traffic source/destination to k adjacent crossbar ports, as defined above. That is due to the fact that Flexbar, with adjacent connectivity, preserves the symmetry and regularity of a conventional crossbar switch, enabling efficient custom layout, easy scalability to large port counts and easy implementation of traffic-scheduling policies. If we define the utilization of the crossbar as the percentage of the maximum switching capacity that is used, we use the maximum possible data transfers.
We have built additional hardware layers at the inputs and outputs of the crossbar. The scheduler can use the additional routing options now available to route additional packets from sources to destinations. To switch all the traffic cells at the input queue, it can be shown that even an optimal scheduling algorithm will require eight cycles using a conventional crossbar. However, the same input traffic can be processed with the Flexbar switch in only five cycles.
For evaluation purposes, we demonstrated Flexbar in the context of a state-of-the-art network switch. Because of the modular nature of the enhancements, the implementation required minimal changes to a traditional crossbar layout. We laid out the 32 x 32 CMOS crossbar (0.35-micron technology, 8 bits wide), and then enhanced it by adding the input and output layers.
For each request that comes in from the traffic sources, the scheduler creates k requests corresponding to the different possible paths through the crossbar. Next, duplicate requests are recombined before being sent to the crossbar scheduler. The crossbar scheduler then returns the result of scheduling, which dictates which paths through the crossbar will be active in the next cycle.
The basic design advantage is the increased flexibility that allows scheduling of data transfers that, hitherto, would be blocked. The advantages are significantly lower latency of packets, an increase in the peak load for individual ports, improvement in switch utilization for a wide range of traffic scenarios and flexible quality-of-service mechanisms for high-priority flows.
We are currently investigating other benefits of the architecture in the context of network switches and other applications.
See related chart