Design Article
Understanding Service Availability--An Industry in Transition
John Fryer, Service Availability Forum
7/22/2009 2:49 AM EDT
This next phase of network evolution must happen quickly; service providers are already offering new services that challenge current network service capabilities. Users expect new, innovative distributed services to be delivered on demand and without interruption. These demands of the emerging communications environment can be met rapidly with the adoption of open industry standards. To achieve widespread adoption, educating application developers on how best to leverage open specifications for their work is key.
A Paradigm Shift
The transformation of networks--the migration from discrete-service based architectures to converged networks that are IP-based, service oriented and transport agnostic--is not lost on those who supply telecom equipment to network operators. In the last decade, there has been a continued paradigm shift from a vertical to a horizontal industry model, as equipment manufacturers must build communications equipment, and enterprises must develop applications, that achieve the highest possible levels of availability and dependability. This is driven by ever-shorter development cycles and constant pressure to reduce development costs. Service providers, in turn, rapidly deploy new services and vouch for their availability and integrity in order to successfully compete for users and strive to meet customer service level agreements.
What is catalyzing this shift is the emergence of key open specifications that are creating clear delineation between various functional layers of a highly available system. This standardization of functional layers--hardware, operating system, middleware and application services--is greatly facilitating the ability for systems designers to develop highly available deployment ready systems, using commercial off-the-shelf (COTS) building blocks (See Figure 1). The emergence of multiple COTS suppliers for each of the building blocks is helping create a viable and vibrant ecosystem that provides compelling alternatives to build systems by leveraging a strong COTS ecosystem. As a result, development organizations are focusing their precious, often shrinking, resources on activities that differentiate them from competitors--applications and services.

What is Service Availability?
A key requirement to deploy a highly available system is that it must provide uninterrupted service even in the event of hardware or software failures. Examples include the communications industry, where "carrier class" is synonymous with high availability, and the defense industry, where "mission critical" systems are essential in an increasingly high-tech environment. Historically, Network Equipment Providers (NEPs) have designed and built such systems from the ground up, using the specialized, in-house expertise developed over decades.
Traditional definitions of high availability have roots in hardware systems, where redundancy of equipment was the primary mechanism for achieving uptime over a specific period. As software has come to dominate the landscape, the probability of failure is often much higher for applications than it is for hardware, so these concepts have been extended to encompass an overall view of Service Availability, where downtime, irrespective of its cause, is an exceptionally rare event. Services and applications should always be available, whether during abnormal system operation, scheduled maintenance, or software upgrade.
The key principles of Service Availability extend beyond the reactions to a failure. Rather, they encompass the idea of system monitoring where preventative action may be taken before a critical situation occurs. Examples of this might include redundancy, fault prediction and avoidance, stateful and seamless recovery from failures, and mean time to repair. Correct system design and exhaustive testing aside, today's complex system can often interact in ways not envisioned by system designers.
Many systems providers have invested a significant amount of time and resources in developing software services, often referred to as high availability middleware, essential to building platforms and systems that provide service availability approaching FIVE-NINES or better. The concept of a number of "NINES" is the normal measure used, which translates into the amount of downtime per day, or year. Applications with high service availability generally fall into the FIVE-NINE's or higher category, which translates into less than 5.25 minutes of downtime per year or less than .86 seconds per day. This is why in many circumstances phone service may still be available even if there are power failures.
Figure 2 below shows the characteristics of an available system.




