Click here for Part 1.
Why create Convergence Enhanced Ethernet?
The industry has pursued several fabric convergence options, including InfiniBand (IB) and iSCSI. Each of these options has done well in a segment of the IT market. However, none has been able to fully satisfy enterprise data center storage requirements.
Enterprise data center storage customers are conservative when it comes to deploying new storage standards. Most want the new standard's ecosystem to be fully available and, more importantly, vetted out, before they deploy it. In other words, for any given new standard initiative, customers want all the basic capabilities (e.g. reliability, availability, serviceability, performance) to be robust enough to meet their application environment's demands. Additionally, these customers have a large install base of Fibre Channel components (servers, storage and switches) and want to protect that investment.
We will cover how IB, iSCSI and Fibre Channel over Internet Protocol (FCIP) compares to Fibre Channel over Convergence Enhanced Ethernet (FCoCEE) later in this article. But first we will describe why Convergence Enhanced Ethernet (CEE) was needed, versus just using Fibre Channel (FC) over (today's) Ethernet.
To satisfy performance requirements, storage fabrics require a lossless traffic service. Today's Ethernet can provide lossless behavior through the use of the IEEE 802.3x Pause standard. This standard allows a congested Ethernet port to assert back-pressure through an XON/XOFF protocol. Using IEEE 802.3x, the receiver will send an XOFF when it can't take any more data. The sender will then pause until the receiver sends an XON to resume communications. IEEE 802.3x works fine for Ethernet networks that are dedicated to storage traffic, because congestion only affects the storage traffic.
If Fibre Channel were to run directly over an Ethernet link that is also carrying other traffic classes (e.g. LAN or HPC), the IEEE 802.3x pause mechanism causes head of line blocking for all traffic classes flowing over the congested fabric segment. For example, if a storage flow experiences periods of traffic bursts that cause congestion, the congestion spreads to other traffic classes, such as sockets communications between servers or between servers and clients. Obviously, performance collapse is not acceptable for a convergence fabric. Over-provisioning can always be used, but is more expensive and must be continually upgraded to stay far above the possible IO demands.
The industry is working on enhancements to Ethernet that provide per priority flow and congestion control. The standard version of these enhancements is known as Convergence Enhanced Ethernet (CEE). For customers seeking FC and Ethernet convergence, FCoCEE is the model we advocate.
What is Convergence Enhanced Ethernet?
IBM is working with Broadcom, Brocade, Cisco, Emulex, Intel and several other companies to create a set of IEEE 802 standard functions that improve Ethernet's ability to converge fabrics in the data center. Convergence Enhanced Ethernet is the term used to refer to the IEEE 802 standard version of these functions.
CEE consists of the following enhancements being pursued in IEEE 802 working groups:
- Multiple priority levels with per priority flow control - this effort is focused on creating a standard mechanism that can control the flow of a single traffic class, without affecting the flow of other traffic classes on the same link. With per priority flow control when a traffic class gets congested, other traffic classes running over the same link are not affected. Additionally, it can be used to provide lossless transmission for specific traffic classes.
- Priority based packet scheduling mechanism - this effort is focused on creating a standard mechanism that can be used to set scheduling priorities for a set of traffic classes. The scheduling mechanism provides end-to-end QoS. The scheduling policy isn't degraded by competing traffic in the network (e.g. allowing small messages to be inserted between the packets associated with a large data transfer).
- CEE discovery and capability exchange protocol - this effort is focused on creating a standard mechanism for defining the domain formed by CEE compliant components in order to ensure interoperability.
As shown in Figure 6, the functions described above allow CEE to provide traffic differentiation, such that multiple traffic classes can flow over the same link, without impacting each other. CEE can be combined with iSCSI to provide traffic differentiation at the Ethernet link layer for iSCSI data flows. For example, network adapters can associate a specific traffic class with iSCSI traffic and a different traffic class for application sockets communications. CEE can then be used to provide service differentiation between these two traffic classes.
6. CEE Overview
There are two additional mechanisms that will further enhance convergence over Ethernet, but are not absolutely required, these are:
- Link level congestion management - this effort is focused on creating a standard mechanism that can control unicast traffic in networks with long-lived data flows with respect to their bandwidth-delay product. Link level congestion management provides a mechanism for detecting congestion and backing off the traffic flowing on the congested traffic class.
- Link level shortest path first based routing protocol - this effort is focused on creating a standard mechanism that can provide shortest-path frame routing in multi-hop IEEE 802.1-compliant Ethernet fabrics with arbitrary topologies, using existing link-state routing protocol technology. Note, this standard is being pursued in the IETF's TRILL working group.
Figure 6 also shows that Fibre Channel traffic can take advantage of CEE. IBM is working with Brocade, Cisco, Emulex, Intel and several other companies to create a set of IEEE 802 standard functions that improve Ethernet's ability to converge fabrics in the data center. The next section will describe how CEE can be used in fabric convergence scenarios.
CEE Convergence Options
CEE enhances the NAS and iSCSI option covered earlier, by providing traffic differentiation at the link layer. Today, data center switches can use network layer traffic differentiation, but that level requires switches with layer 2+ capabilities, which have higher latencies (and cost) than layer 2 switches. Similar to layer 2+ switches that support traffic differentiation, CEE switches must support per traffic class resources, which means more buffer resources than switches that don't support traffic differentiation. However, unlike today's layer 2+ switches that support traffic differentiation, CEE switches do not have to retain IP routing tables or associated contexts.
As shown in Figure 7, CEE also enables an emerging convergence option, FC directly over CEE. On the server side, this option uses either FC or FCoCEE based adapters to connect servers to storage. If FCoCEE is used, then the same adapter can also be used to connect the server to other servers or networking equipment. That is, the same adapter can be used to carry other traffic classes, such as HPC, LAN or IPC messages. On the storage side, this option uses either FC or FCoCEE based adapters to connect servers.
7. FCoCEE based Convergence
For data centers that have a large FC install base, FCoCEE enables new servers or storage deployed in that data center to use a single link for both Ethernet and FC communications. Many enterprise data center customers will likely want the infrastructure required to enable this option to be vetted out before production deployment begins. That infrastructure includes: interoperable support from FCoCEE adapter vendors, interoperable support from FCoCEE switch and gateway vendors and most importantly the management software required to interlink the virtual and physical FC fabric to the physical CEE fabric. Given that a portion of the HPC and Analytics markets is interested in network convergence today, we expect FCoCEE to fare well in these two market segments (esp. when IB's high bandwidth and low latency are not needed).
8. Interlinks between management layers
When comparing these two future options with today's options, one important consideration is the management complexity. NAS
and iSCSI over Ethernet or CEE use a single network management infrastructure for storage, LAN and IPC. However, FCoCEE requires two network management infrastructures: an FC based infrastructure that manages virtual FC (i.e. FCoCEE attached) and physical FC (i.e. natively attached) components; and an Ethernet infrastructure to manage the underlying physical Ethernet or CEE fabric. The interlinking between these the FC and Ethernet network management
infrastructures will be necessary for production deployment of FCoCEE.
As can be seen from the table below, no single solution fully satisfies all the requirements. Our view is that IB fits well in cluster convergence scenarios where performance is critical. NAS and iSCSI fit well in mid-sized environments and for middle tier storage in multi-tier server environments. NAS and iSCSI are also well suited in cases where no FC exists, because they require one less fabric management type (i.e. all the fabric management is Ethernet based). Given that large enterprises place a high bar on storage (availability, security, quality, etc...), we expect FCoCEE will initially play in the HPC and Analytics market as an alternative to IB for environments where performance is less critical. To enable enterprise deployment of FCoCEE, we believe the functions listed in the table need to be standardized, developed, tested and hardened in real world environments. The table shows the current state of these functions for each option. As FCoCEE matures (i.e. these functions are made available), we expect it will play well in large enterprises wanting to pursue FC convergence. For a brief overview of the functions, see the Appendix.
Table: Current and Future Protocol Comparison
Defines the terms in the above table:
Functions needed for enterprise deployment:
- Per priority flow control and Packet scheduling - As described earlier, these mechanisms are needed to minimize interference between traffic flows.
- Storage frame format and semantics - A format is needed for block storage traffic (i.e. SCSI).
- End-end loop prevention - A mechanism is needed to prevent packet forwarding configurations that result in livelock (See Page 14 for more information).
- Distributed storage identification protocol - A mechanism is needed for assigning storage fabric identifiers to endpoint fabric ports (See Page 17 for more information.
- Multi-pathing of end-to-end flows - To support HA (e.g. switch over between endpoint fabric ports) and QoS requirements, a mechanism is needed to provide multiple paths between two endpoints.
- Fabric identifier translation service and Access control list configuration service - These mechanisms are needed to enable direct communication between an initiator and a target that are directly attached to the convergence fabric (See pages 12, 21, and 22 for more information.).
- Gateway High Availability protocol - To support some of the HA configurations used in enterprise IP/Ethernet networks, a mechanism is needed to provide switchover across virtual switch ports (See pages 15, 23, and 24 for more information.)
- Interconnection of management views - Perhaps the longest to get completed, a mechanism is needed to interconnect the various management views (See page 25 for more information.)