As Ethernet moves from being an enterprise-centric technology into a WAN and access technology, designers need to bring carrier-class capabilities to Ethernet designs. And one of the most important characteristics of a carrier-class technology is the implementation of operational, administration and maintenance (OAM) management capabilities.
While management capabilities are available for enterprise-class Ethernet networks, these same capabilities have not been available for WAN and access Ethernet networks. Recognizing this need, the IEEE 802.3 CSMA/CD Working Group (the Ethernet standards body), through its Ethernet in First Mile (802.3ah) task force, defined a set of OAM capabilities for Ethernet links.1. These capabilities are introduced gracefully to ensure backward compatibility with existing Ethernet implementations, while still providing advanced monitoring functionality as required in public networks.
The OAM work of the IEEE 802.3ah task force addresses three key operational issues when deploying Ethernet across geographically disparate locations: link monitoring, fault signaling, and remote loopback. Link monitoring introduces some basic error definitions for Ethernet so entities can detect failed and degraded connections. Fault signaling provides mechanisms for one entity to signal another that it has detected an error. Remote loopback, which is often used to troubleshoot networks, allows one station to put the other station into a state whereby all inbound traffic is immediately reflected back onto the link.
In this article, we'll take a close look at the OAM capabilities defined under the 802.3ah EFM standard. During the discussion, we'll describe the new IEEE 802.3ah OAM functions and provides some additional details about features and implementation.
A Glaring Deficiency
Over the past few years, Ethernet has started to migrate from an enterprise-class technology to a significant technology for future carrier deployments. As this migration has progressed and more and more Ethernet has rolled out in the public network, a glaring deficiency has been identified. When compared with more traditional carrier network technologies, Ethernet lacks the integrated management capabilities inherent in these other technologies.
Traditionally, Ethernet networks have been managed with IP/SNMP. In an enterprise, this is a perfectly acceptable solution. However, as Ethernet migrates to public networks, IP/SNMP provides several drawbacks. First, if the Ethernet layer is not operating properly, IP/SNMP may not even be available. This implies that certain management capabilities are required within the Ethernet layer itself. Second, requiring IP network to run just to manage the network could be an unacceptable overhead in term of the complexity of the equipment and operational carrier network.
For example, Sonet uses overhead bytes for detecting wiring problems, for monitoring the performance of individual segments and paths, for signaling remote faults, and for out-of-band management communications.3 Likewise, ATM networks include integrated management capabilities such as virtual channel (VC) and segment monitoring to ease the operational burden of large diverse deployments. Even the Internet Protocol suite (TCP/IP) has its own management protocols with ICMP, which provide utilities such as ping and traceroute for network diagnostics and troubleshooting. In all of these technologies, the network includes its own operational signaling and troubleshooting functions to simplify managing large, complex deployments.
The recent success of Ethernet as a carrier technology has identified the OAM shortcomings as a major impediment to more and larger deployments. As such, OAM for Ethernet networks has become a major initiative for both carriers and vendors.
Within the international realm of standards consortia, several bodies have begun major Ethernet OAM initiatives. In general, these initiatives can be classified into three levels: the data link layer, the transport layer, and the services layer. The data link layer is the collection of individual Ethernet segments. The transport layer is the collection of forwarding entities and interconnected segments that form a multi-hop network Ethernet network, and provide basic connectivity between devices. The service layer then uses the network layer to multiplex a variety of services (for example, VLANs2) on the underlying network.
Different standards organizations are developing OAM mechanisms for the various network layers. Specifically, the partitioning of the OAM functionality falls into three organizations: the Metro Ethernet Forum (MEF), the International Telecommunications Union (ITU), and IEEE 802 committee. As shown in Figure 1, the MEF and ITU are teaming together to define OAM capabilities for the services layer. The IEEE 802.1 committee is focused on the transport layer while the IEEE 802.3ah committee is targeting the data-link layer.
Figure 1: Various forums defining specifications for the data link, transport, and service layer in carrier-class Ethernet networks.
The collective work of these standards bodies will result in a complete Ethernet OAM solution. In this paper, however, we'll concentrate on providing details of the completed IEEE 802.3ah OAM specification.
Architecture for 802.3 OAM
Architecturally, the IEEE 802.3ah EFM specification defines OAM as an optional sublayer just above the Ethernet media access controller (MAC). The OAM sublayer consists of a parser block, a multiplexer block, and a control block. All three blocks communicate with an OAM client, which is the "brains of the IEEE 802.3ah management architecture. The relationship between the blocks, the OAM client, and the MAC is shown in Figure 2.
Figure 2: Diagram showing the OAM architecture defined under the 802.3ah specification.
When OAM is present, two connected OAM sublayers exchange OAM protocol data units (OAMPDUs). The OAMPDUs are distinguished from other frames though a combination of the destination MAC address and Ethernet type/length field. The parser detects incoming OAMPDUs and passes them to the OAM control block and eventually the OAM client.
Ethernet OAM shares the same bandwidth as general application traffic. To limit its impact on the usable bandwith for the application, it is limited to at most 10 frames/sec. The multiplexer block mentioned above ensures that OAMPDUs have high priority and limited bandwidth.
Through the 802.3ah architecture, OAM is independent of the physical layer and works over any kind of Ethernet link. The separation of the OAM client and simple parser functionality allow both hardware and software implementations. It is hoped that vendors will quickly implement software versions of Ethernet OAM and release it into already deployed platforms, and that hardware versions may be integrated into future Ethernet MACs for universal availability.
To maintain backward compatibility with existing Ethernet equipment, the implementation of OAM is always optional. The lack of OAM or the failure of OAM does not prevent an Ethernet port from becoming operational.
As a general rule, OAM uses periodic advertisements to maintain an OAM session and convey information on an Ethernet link. Sessions are maintained by sending a minimal number of OAMPDUs per second, and such advertisements may contain management or fault signaling information. This principle of unacknowledged advertisements is used in most of OAM operation, with the exception of a request/response principal for obtaining information on the far end of an Ethernet link.
Understanding the OAMPDU
Under the IEEE 802.3ah architectures, OAMPDUs are an integral component. These packet data units are normal Ethernet frames that use a specific multicast destination address and EtherType. Figure 3 shows the format of an OAMPDU.
Figure 3: Diagram showing an OAMPDU format.
The multicast address and EtherType identify the frame as a slow protocol frame. The standard defines several slow protocols; one example is link aggregation control protocol (LACP). 6 The different slow protocols are identified through the slow protocol subtype, where subtype 3 is designated for OAM. Utilizing the slow protocol MAC address, OAMPDUs are guaranteed to be intercepted by the MAC sublayer and will not propagate across multiple hops in an Ethernet network, regardless of whether OAM is implemented or enabled.
Most information carried by OAMPDU are encoded using type-length-value (TLV) format. The first octet (or byte) indicates the type. This type, collorary to a variable type in a programming language, is also used to let OAM client knows on how to decode the bytes containing the information. The next octet carries the length of the information. This length is typically used for skipping the information when the type cannot be interpreted by the OAM client. Following type and length, one or more octets encode the information itself.
Now that we've discussed the key blocks defined under the IEEE 802.3ah OAM spec, let's take a deeper look at the key operations defined under the protocol. We'll start with discovery.
Discovery is the first phase of the IEEE 802.3ah OAM protocol, and relies on what are termed information OAMPDUs. During discovery, information about OAM entities capabilities, configuration, and identity are exchanged.
One important aspect of the IEEE 802.3ah OAM spec is that an OAM entity may be in what is termed active or passive mode. The difference between the modes is that an active-mode device can exert more control on its peer than a passive-mode device. For example, an active-mode OAM entity can put a passive-mode OAM entity into loopback mode, but not vice versa. These modes are intended to support public network deployments, where carrier side equipment has a superior relationship to the customer premise equipment.
During discovery, configuration and capability parameters are also exchanged. Nodes can use the configuration and capabilities of the peer entity to determine if an OAM relationship should be instantiated, and with what functionality. For example, a node may require that its partner support loopback capability in order to be accepted into the management network. The policies used to determine if a peer's configuration and capability are acceptable are considered policy decisions and not specified in the standard, but provide the flexibility for policy based management associations in a carrier network.
Note: The OAM protocol requires that frames be exchanged with a minimum frequency to maintain the relationship. If no OAMPDUs are received in a 5-second window, the OAM peering relationship is lost and must be re-established to perform OAM functions.
Remote Failure Indication
A flag in the OAMPDU allows an OAM entity to convey severe error conditions to its peer. The severe error conditions are defined as:
- Link Fault: This flag is raised when a station stops receiving a transmit signal from its peer.
- Dying Gasp: This flag is raised when a station is about to reset, reboot, or otherwise go to an operationally down state.
- Critical Event: This flag indicates a severe error condition that does not result in a complete re-set or re-boot by the peer entity .
One of the most critical problems in an access network for carriers is differentiating between a simple power failure at the customer premise and an equipment or facility failure. Dying gasp, if implemented correctly, provides this information by having a station indicate to the network that it is having a power failure. More details on the failure may be included in additional event information conveyed in the frame.
Since the above conditions are severe, OAMPDUs that carry information concerning these conditions are not subject to normal rate limiting policy. Such frames can be sent immediately and repeatedly as they indicate critical failure information.
An OAM entity can put its remote OAM entity into loopback mode using a loopback-control OAMPDU. When an OAM entity is in loopback mode, every frame received is transmitted back on that same port except for OAMPDUs and pause frames (belong to MAC control sublayer below OAM sublayer).
The loopback command is acknowledged by responding with an information OAMPDU with the loopback state indicated in the state field. OAMPDUs continue to be exchanged during loopback mode, only data frames are looped back.
Remote loopback is most useful as a diagnostic tool, where it can be used to isolate problem segments in a large network. When in loopback, metrics such as delay, throughput, bit error rate (BER), and jitter can be calculated to determine the overall quality of a connection.
Ethernet OAM also defines a set of standard event conditions that Ethernet links should monitor in normal operation, and if detected, should be signaled to a peer entity. These conditions reflect a degraded, but not yet inoperable, Ethernet connection. These conditions include threshold-crossing alarms on the frequency of symbol errors and frame errors.
When a pre-configured error threshold is crossed, an event notification OAMPDU is sent to the OAM peer. Additionally, as is done in more traditional SONET and DSL networks, an errored second was defined to indicate a period where an unusual number of errors occurred, and thresholds can be set the number of errored seconds over a specific period.
Since IEEE 802.3ah OAM does not provide a guaranteed delivery of any OAMPDU, the event notification OAMPDU can be sent multiple times to reduce the probability of a lost notification. Each event notification OAMPDU is tagged with a sequence number to recognize duplicate event notifications.
MIB Variable Retrieval and Extensions
The IEEE 802.3ah Ethernet OAM specs also defined a generic mechanism for one OAM entity to query another for the value of any management information base (MIB) variable. MIB variables include all performance and error statistics maintained on an Ethernet link. This capability provides a very generic monitoring capability for one station to monitor any parameter on another for performance or error detection.
Vendors and organizations can extend Ethernet OAM capabilities though organization specific OAMPDUs and organization specific TLVs within the standard OAMPDUs. Vendors can use these extensions to implement extra events; to include additional information during discovery; or even to add a completely proprietary OAM protocol to the standard operation. By virtue of TLV format, unrecognized extensions can be skipped by non supporting node.
The Ethernet OAM work of IEEE 802.3ah is the first step toward including inherent management capabilities in Ethernet equipment for public network deployment. The protocol described in this document provides utilities for monitoring and troubleshooting Ethernet links, and can feed into larger management frameworks for a fundamental OAM capability.
- IEEE, IEEE 802.3ah Draft P802.3ah/D3.3, "Amendment: Media Access Control Parameters, Physical Layers and Management Parameters for Subscriber Access Networks," April 2004.
- IEEE, IEEE Std 802.1q, "Virtual Bridged Local Area Network," December 1998.
- R. P. Ballart and Y-C. Ching, "SONET: Now It's The Standard Optical Network," IEEE Communication Magazine, May 2002.
- T. D. Nadeau, et al., "OAM Requirements for MPLS Networks," draft-ietf-mpls-oam-requirements-02.txt, Jun. 2003, (work in progress).
- Metro Ethernet Forum, "Metro Ethernet Network: A Technical Overview," October 2002.
- IEEE, IEEE Std 802.3, "Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications," March 2002.
- J. Case et. al., "A Simple Network Management Protocol (SNMP)," IETF RFC 1157, May 1990.
About the Authors
Stephen Suryaputra is a senior software engineer for Hatteras Networks. He received a Sarjana Teknik (B.S. in Engineering) from Sekolah Tinggi Teknik Surabaya, Indonesia and an M. S. in Electrical Engineering from University of Southern California. Stephen can be reached at ssuryaputra@HatterasNetworks.com.
Matthew Squire is the chief technology officer for Hatteras Networks. Matthew holds more than fifteen patents and has more than 10 in the pipeline. He chaired the OAM sub-taskforce in the IEEE 802.3ah Ethernet in the First Mile working group and has also served editorial positions in the Metro Ethernet Forum and ANSI T1 committee. Matthew can be reached at firstname.lastname@example.org.