The design goal for third-generation General Packet Radio Services (GPRS) systems is to minimize, if not eliminate, the interruption of data services and voice services for users. System reliability is critical for customer acceptance of 3G, and for the service providers that are investing in infrastructure and making these data services available worldwide. Critical system reliability can only be facilitated by incorporating high-availability (HA) capabilities into all of the systems in the wireless infrastructure.
While non-HA GPRS 2.5G systems are capturing headlines, HA GPRS 3G systems have been rolling out at a slower pace than some service providers would like because the overall cost of these complex systems slows development and acceptance of HA. Since cost has a direct bearing on how quickly the systems can be deployed, reducing that cost has become a major focus of infrastructure equipment manufacturers.
Five-nines (99.999%) HA, which is generally described as less than 5.25 minutes of downtime per year, is crucial in 3G services. Occasional dropped packets or corrupted data in data communications, or occasional dropped calls in telecommunications, can be tolerated, but the loss of an entire node for an extended length of time due to a single hardware or software failure is totally unacceptable in either. HA, then, is the indispensable piece of the telecommunications design puzzle that makes a system workable and valuable.
But the standard approach to HA in telecommunications systems-and in the wireless infrastructure in particular-has been the use of proprietary hardware and software. That approach is costly in terms of time-to-market and overall system costs to vendors as well as to users. The signaling server application can be used as an illustration of how to design a telecommunications system using commercial off-the-shelf (COTS) hardware and software to solve this critical problem in a timely and cost-efficient manner.
For systems that work with the telecommunications network, a mission-critical piece is the signaling server. A signaling server is part of the system that controls calls on a set of associated phone lines. Signaling is defined as the exchange of information specifically concerned with the management, establishment and control of connections in the telecommunications network.
One system that would benefit from a signaling server is a cellular base transceiver station (BTS) that is connected to its basestation controller, or to a mobile switching center via land lines or backhaul lines. Another example is a paging system that uses the public switched telephone network to receive calls. Those calls carry messages that are to be forwarded to the subscribers' pagers. In both cases, call setup and control is fundamental to the operation of the system. If the system cannot set up and receive calls on the lines that are provisioned to it, the entire system is considered to be out of service.
In the first example, the cellular customer's calls are not connected while the user is in the area covered by the BTS. In the second case, callers to a subscriber's pager number keep receiving an annoying fast busy signal and are not able to page the subscriber. These examples typify a classic problem for countless applications with a single mission-critical piece that renders the entire system inoperative if it fails.
Systems interacting with a telecommunications network must be able to set up and terminate calls on the lines connected to the system. That signaling information can be carried in-band over the voice/data channel, or over a common channel via the Signaling System 7 (SS7) system. In-band signaling can be used in simple applications that only need to answer or hang up lines, and that don't need the full bandwidth of the telecommunication lines. In-band signaling services use a portion of the bandwidth on each individual telecommunication line and are distributed across all of the incoming analog lines or T1 spans. Each T1 line comprises 24 individual lines, or channels, combined into one digital line with framing information at 1.544 Mbits/second.
In-band signaling does not have a single point of failure, since the signaling is distributed across all of the incoming lines, and a failure in any one line cannot render the rest of the lines unusable. The costs of such system resilience are loss of bandwidth on all of the lines, more processing at the individual line level and the lack of more advanced services available to the user. A common-channel SS7 service approach eliminates those costs for more advanced applications and user systems.
Common-channel signaling (CCS) via the SS7 network preserves the full bandwidth on all of the digital telecommunication lines and allows the use of advanced intelligent network (AIN) features. One mated pair of SS7 links can control the signaling requirements of hundreds of individual telecommunication lines. Along with offering a more modular approach to signaling, those advantages have made CCS a more desirable alternative. However, a problem arises when the signaling module or server fails, making the hundreds of lines carried on a T1 or T3 line-the functional equivalent of 28 T1s-unusable.
From a hardware standpoint, the SS7 system resists failure by using a mated pair of links operating at less than 50 percent of the available bandwidth. The spare bandwidth is used to send fill-in signal units (FISUs) that are used to detect link failure. A failure in a hardware link triggers an alert and a switch-over to the other link until service is restored. Internal to the system, a mesh of interconnected signaling transfer points allows messages to be carried through multiple paths, over multiple links, in order to maintain reliability and system availability.
The SS7 protocol stack consists of four levels. The lowest three levels of the SS7 architecture, referred to as the Message Transfer Part, provide a reliable but connectionless service for routing messages. MTP Layer 1 handles the physical and electrical characteristics of a full-duplex physical link. MTP Layer 2 deals with the data link itself, which includes removal of FISUs and individual link error monitoring, data order integrity, and flow control. MTP Layer 3 controls SS7 link management, message routing and traffic management, and is the layer where the mated pairs converge.
From a telecommunications user's perspective, the main objectives of MTP Layer 3 are to overcome SS7 link degradations or failures, and to distribute messages to the higher SS7 layers. To meet the first objective, MTP Layer 3 monitors the status of each link, and handles recovery from the loss of messages due to link failure.
The fourth level of the SS7 architecture is the applications level. This level includes the Signaling Connection Control Port (SCCP), providing enhanced connectionless and connection-oriented services, and the ISDN User Part (ISUP) for circuit-switched or standard phone services and associated user facilities. The Transaction Capabilities Application Part, also part of the fourth level, provides services to higher-layer applications service elements as well as the advanced features of the AIN that many telecommunications users and applications require. Access to those higher layers is often the driving factor that spurs vendors of more advanced systems to integrate SS7 into their applications, instead of using in-band signaling.
A closer review of the SS7 stack reveals that any number of mated links will only possess one provider of MTP Layer 3 functions and procedures. Therefore, MTP Layers 3 and above are the mission-critical components for the signaling system. A failure at those layers can render the managed signaling links useless and, therefore, the controlled telecommunications T1 and T3 user lines also become useless.
One HA COTS solution to that problem is a CompactPCI (CPCI) computer platform much like the CPX8216 system from Motorola Computer Group, which provides HA and hot-swap hardware services. The HA services are facilitated by a real-time operating system (RTOS) that supports reliable processing, such as the memory management unit in LynxOS, CPCI hot swap and telephony protocols.
The CPCI hot swap feature allows the I/O boards and CPU boards to be extracted without requiring the system to be powered down. Using a software "heartbeat" supplied by the RTOS to indicate a failure, services are maintained by a standby system that is kept active on the same heartbeat. The failed MTP Layer 1 and 2 board can then be replaced, thus providing uninterrupted service on the SS7 signaling links without interrupting service on the board's SS7 mated pair link. Additionally, the standby CPU board can take over the CPCI bus, associated I/O cards and MTP Layer 3 link management services if the host board running the critical MTP Layer 3 functionality fails, thus providing uninterrupted service on the SS7 signaling links. The failed host CPU board can then be hot-swapped without interrupting the signaling service on the signaling links. This capability allows calls on the associated voice and data T1 and T3 lines to continue being processed, despite failures associated with the critical signaling channel components.
Those same concepts can be applied to many other mission-critical applications or systems with high uptime requirements. Historically, HA solutions used proprietary hardware and software that took years to develop and deploy. Today, COTS CPCI hot swap hardware, HA extended RTOS and application software designed for high-availability systems can be prototyped and deployed with a speed that meets even the most demanding time-to-market requirements.
Vendors such as LynuxWorks are pushing forward standards-based COTS software approaches because they are cost- effective and expedient. If HA GPRS systems are to see the light of day in the foreseeable future, software HA implementation must be adopted by systems suppliers.