Resource constrained data centers cannot waste anything on their way to efficiency. Answers vary depending on data-center size, and include PODS. What will need to be in place to make this a practical solution?
The data center needs to evolve to a much more efficient state as it expands to serve the cloud. Large data centers cannot tolerate wasted cost, power, or area as they compete to support cloud-based services. There are multiple industry initiatives currently underway to support the new efficient data center including:
- Server virtualization to optimize server utilization
- Data Center Bridging (DCB) to support converged data center fabrics
- FCoE to eliminate the need for additional FC fabrics
- TRILL to optimize data center fabric bandwidth utilization
- VEPA to support server virtualization through a single physical link
Another trend is to compartmentalize the data center architecture into atomic units called PODs, which look like shipping containers. Such companies as HP and Sun are developing PODs as pre-wired and pre-configured data center building blocks. These are trucked into the data center and ready to run after connection to power, cooling, and the network. An example POD block diagram is shown in Figure 1.
Figure 1. Data Center POD Block Diagram
For many smaller applications, each server can be configured as multiple virtual machines (VMs), which are connected to the fabric through a single physical port. In this case, protocols like VEPA will be used to create multiple logical connections to the fabric as shown in Fig 1. In some cases, cloud users will need to run large applications that require multiple servers in a cluster. Here, the fabric must support data center bridging (DCB) features along with low latency for high performance.
These servers will also need to access storage through the fabric using protocols such as Fibre Channel over Ethernet (FCoE), which must support lossless operation and bounded latency using DCB features. The POD must connect to the outside world using multiple high-bandwidth connections. Assuming a homogenous data center, each POD will contain network security. In addition, applications with high user volume may require a server load balancing function.
Today, PODs are being developed that require several hundreds of fabric connections. Soon, these will scale to more than 1,000 fabric connections. Data center fabrics must meet these port counts while providing all of the features described above. In addition, they must do this in a very efficient manner, as every dollar, watt and square meter is critical when designing a POD.
In the late 1990s and early 2000s, proprietary switch fabrics were developed by multiple companies to serve the telecom market with features for lossless operation, guaranteed bandwidth, and fine-grained traffic management. During this same time, Ethernet fabrics were relegated to the LAN and enterprise, where latency was not important and quality of service (QoS) meant adding more bandwidth or dropping packets during congestion. In addition, many research institutions developing high-performance computing (HPC) systems used InfiniBand (IB), which was the only choice for a low-latency fabric interconnect solution.
Time has dramatically changed this landscape. Over the past three years, 10Gb Ethernet switches have emerged with congestion management and QoS features that rival proprietary telecom fabrics. As evidence of this role reversal, of the 30 or so proprietary telecom fabrics available in the year 2000, only one has survived. Even so, some companies are pushing telecom-style fabrics into the data center.
With the emergence of more feature-rich 10GbE switches, IB no longer has a monopoly on low-latency fabrics. Many HPC designs are moving to this new cost effective Ethernet solution, pushing IB further into niche applications. Because of this, the two surviving IB switch vendors are even adding Ethernet ports to their multi-chip solution.
The industry needs a cost effective fabric solution for the data center that can scale to POD-size requirements. The obvious choice is an Ethernet fabric, with converged features for clustering and storage. Adding Ethernet ports to a telecom-style fabric dramatically increases cost size and power compared to an Ethernet switch based solution that has been designed for the data center.
Telecom-Style Fabrics in the Data Center
A telecom-style switch fabric typically contains a fabric interface chip (FIC) on each line card that connects to one or several central switch devices. To provide the fine bandwidth granularity required for legacy protocols such as ATM or SONET, the FIC segments incoming packets into fixed size cells for backplane transport, and then reassembles them on egress. Due to the input/output queued nature of the system, virtual output queues (VoQs) must be maintained on ingress to avoid head-of-line (HOL) blocking. The FIC also contains traffic management functions, which hold packets in external memory until they can be segmented and scheduled through the switch.
Today's process technologies allow FIC designs that contain up to 8 10GbE ports on the line side with up to 12 proprietary 10G ports to the backplane. Backplane over speed is required due to factors such as cell segmentation overhead and fail-over bandwidth margin. The switch can contain up to 64 10G proprietary links to the FICs. Cells can be striped across up to 12 switch chips, providing a maximum of 64 FICs or up to 512 10GbE ports.
Figure 2. Top-of-Rack Switch Using Telecom-based Fabric
This graphic shows how this fabric can be used for a top-of-rack switch in the data center. In this case, a mesh fabric cannot be used, as it would require at least 24 10G backplane links on each FIC. As can be seen, this is not a cost effective solution for this application as these devices could be replaced with a single 10GbE switch chip.
Figure 3. 1024-port Data Center POD Switch using Telecom-style Fabric
Figure 3 shows how this fabric must be configured to support up to 1,024 10GbE ports, which is required for the next generation data center POD. To do this, a small fat tree must be created on each switch card to scale past the 512-port limit described above. This solution has significantly component count and high latency.