The telecommunication industry seems to be mimicking the transformation that the enterprise computing industry went through during the 90s--it is moving from a vertical industry model to more of a horizontal model. Maturation and adoption of a key set of standards are making it possible for Telecom Equipment Manufacturers (TEMs) to rely on an emerging eco-system of Commercial Off-The-Shelf (COTS) components to build network elements, and focus their precious resources on their core value-add.
A few key standards address the hardware platform, the operating systems, and the middleware layer of network elements. Three industry consortia are particularly relevant in driving the COTS adoption in the telecom industry.
PCI Computer Manufacturing Group (PICMG) (www.picmg.org) a consortium of several companies including TEMs, develops and promotes carrier grade equipment standards. A recent set of specifications called AdvancedTCA (ATCA) from this consortium is fast gaining wide industry acceptance and adoption. ATCA, targeted primarily at developers of telecommunication applications, defines standards for creating new architecture that allows ease of integration and migration of telecom applications across platforms. Many TEMs have already announced plans to provide network elements based on standard ATCA platforms.
Open Software Development Laboratory (OSDL) (www.osdl.org) is an industry body dedicated to accelerating the adoption of the Linux operating system for enterprise computing and carrier applications. The use of Carrier Grade Linux (CGL) specified by OSDL is fast gaining traction among equipment vendors, TEMs, and service providers alike.
Service Availability Forum (SA Forum) (www.saforum.org), a vendor consortium, develops and promotes standard specifications that enable independent software vendors to develop easy to integrate interoperable middleware COTS components.
The Hardware Platform Interface (HPI), the first specification published by the SA Forum defines a standard interface between the service availability middleware and the hardware platform. The second interface definition known as the Application Interface Specification (AIS) establishes an interface between the high availability middleware and the application layer. The Systems Management Interfaces, provides an umbrella that ties together the management capabilities for HPI and AIS services.
In January 2006, the Forum announced the availability of new and enhanced interfaces representing the complete set of SA Forum specifications that includes updates to all these specifications1. These specifications are intended to facilitate portability of middleware and applications across multiple platforms, thus reducing the startup cost and the integration effort (See Figure 1).
Figure 1. Elements of SA Forum AIS
Proliferation of such standards enables designers to rapidly build application-ready platforms that utilize various COTS building blocks. This allows the equipment vendors to minimize cost and effort involved in building carrier-grade network elements, and focusing their precious resources on their core competence – communication applications. A high level overview of the SA Forum Application Interface Specification is presented here.
Application interface specification
The Service Availability Forum's Application Interface Specification specifies an Availability Management Framework and seven core services. When implemented together, these services provide a comprehensive set of functionality that allows system designers to build highly available applications that are interoperable and portable across a variety of compliant middleware and platforms (See Figure 2). The Availability Management Framework along with the seven services is described here.
Figure 2. Key standards enable COTS-based network elements
Availability management framework (AMF)
The AMF specifies a software entity that provides service availability by coordinating redundant resources within a cluster to deliver a system with no single point of failure. It provides a consistent view of one logical system that comprises a number of cluster nodes each of which host various resources in a distributed computing environment.
This framework provides a set of APIs to enable highly available applications. It drives the high availability state of various system components, and monitors their health by invoking callback functions of these components, as defined in this API. It also manages the readiness state without exposing it to components. It further allows a component to query the framework for information about a given component's high availability state, using functions defined in the set AMF APIs.
Cluster membership (CMS)
The cluster membership service is fundamental to defining and deploying a system of clustered nodes. It services a critical cluster node bookkeeping function and, as such, provides applications with up-to-date cluster membership information as the nodes enter or leave the system. Applications register callback functions with cluster membership service (CMS) to receive current cluster membership notifications as changes occur in the cluster configuration.
A cluster consists of a set of configured nodes, each with a unique node name. A member node is a configured node that the CMS recognizes as healthy and well connected to be used for deploying highly available applications and services. The CMS is the authority that determines whether a configured node is allowed to transition as a member node of the cluster. The set of member nodes at a given point in time comprises the cluster membership.
In order to implement a system capable of seamless recovery from faults, it is important to record and retain dynamic state information that can be readily used by a redundant resource to resume the service provided by the failed resource. The checkpointing service (CKPT) provides such service in a highly available system. It provides a facility for processes to record checkpoint data incrementally. In the event of a failure such checkpoint data can be retrieved, and execution can be resumed using the state recorded before the failure. In AIS, checkpoints are cluster-wide entities that are designated by unique names.