The telecommunications industry is in the midst of a massive change. In an effort to keep pace with the unprecedented traffic generated over the Internet, the industry is moving from monolithic, proprietary platforms to a standards-based approach. The result is a growing dependence on external vendors to rapidly develop scalable but highly reliable solutions. This has, in turn, focused intense pressure on the CompactPCI specification as a means for supporting the exceptional high-availability (HA) capabilities required by the telco industry.
Uptime is crucial in the telecommunications industry: When customers pick up the phone, they expect a dial tone anytime, anywhere and under any conditions. "Five-nines," or 99.999 percent, availability translates to just five minutes of downtime a year, including repairing a failure, installing an upgrade and general maintenance. Until recently, telecommunications providers developed their own proprietary systems to achieve this incredibly stringent level of availability, designing all aspects of the system from the chips on out to higher-level software. Consequently, as telcos begin to move into the new, open-platform market, it is now common for them to demand five-nines availability from equipment vendors.
The arrival of the dot-com age has created significant opportunities for carriers, while putting tremendous pressure on their existing infrastructures. By quickly adding services, it is now possible to achieve competitive differentiation. For example, a carrier can rapidly integrate a host of new services, such as call waiting, caller ID, voice mail, Internet services and DSL. But all of these new capabilities must be offered at the same levels of availability as existing services. Recognizing that they cannot possibly support this slew of new services by developing internal, proprietary equipment, carriers now look to communications equipment providers for commercial, off-the-shelf (COTS) platforms that incorporate built-in support for high availability. In this way, carriers can quickly construct and extend high-availability systems, reducing time-to-market and engineering development costs.
Once carriers acknowledge that they need a COTS-based approach, the biggest question then becomes which hardware platform to standardize on to support stringent high-availability requirements. Great care must be taken to select a standard that is highly reliable, yet flexible enough to allow systems to be tailored to specific needs. It is critical that platforms incorporate components based on open industry standards. The sheer amount of development around an open standard ensures greater reliability and makes the components much more cost-effective. For rack-mounted telco systems, the choice quickly narrows down to PCI, VME and CompactPCI.
The PCI bus is already the industry standard for millions of desktop systems. Unfortunately, it provides neither the higher levels of reliability nor the uptime needed in a high-availability system. There is no easy way to cool this type of board, which incorporates edge connectors that are notorious for their low reliability and can be easily damaged when replacing the board. In its favor, however, the PCI standard leverages the tremendous advantages of the enormous PC industry, which is built around commodity PC silicon and device drivers. This means lower cost, more choice and a shorter time frame when accessing new technologies. As a result-and due to its wealth of robust device drivers and proven, inexpensive silicon-PCI is very cost-effective, reliable and extremely flexible.
The VME standard, on the other hand, was specifically developed for industrial applications, within which high availability has long been a major concern. Thus, it offers superior reliability, is designed specifically for cooling and can be easily installed or removed. However, as a proprietary industrial approach, VME is not only expensive but also limited in what it offers and supports. VME is backed by a limited set of device drivers and custom silicon, making it costly to purchase and maintain. Moreover, software ported to the VME environment requires customization, increasing the verification and support tasks.
To address the limitations of VME and PCI, a consortium of more than 400 computer suppliers and manufacturers worked closely together to create the CompactPCI specification. The standard deliberately merges the performance, scalability and reliability of VME with the cost efficiency and flexibility of the PCI standard. Network equipment, telecom equipment and service provider manufacturers are embracing the approach.
Specifically, CompactPCI takes the best of the VME world (dense, rugged packaging with excellent cooling properties in large installations) and combines it with the best of the PC world (cheap, fast silicon with access to the latest interconnect and processing technologies). Other benefits include a standard form factor that is electrically equivalent to the desktop PCI bus and that supports exactly the same interface chips as those used in desktop PCs and workstations. CompactPCI also delivers high performance. And it is scalable and expandable, with support for up to 256 PCI buses capable of concurrent operation using standard bridging practices. Most important, it is dependable. Built around highly reliable, PC-based silicon and drivers, CompactPCI incorporates a 220-pin, 2-mm "hard metric" connector that ensures adequate shielding and grounding for low ground bounce and reliable operation in noisy environments.
But for all its strengths, the standard as it stands today cannot support the five-nines high-availability requirements that telcos demand. Sun Microsystems is therefore enhancing and extending the basic architecture to meet specific availability requirements within the company's recently announced CP2000 High Availability Program. That program comprises integrated hardware and software components addressing device-level, switch-level and background-level functions, combined with training, documentation and consulting.
Essentially, the CP2000 program takes advantage of the unique features of CompactPCI, using them as a foundation for the program. The program leverages such capabilities as hot swap and the benefits of the Intelligent Peripheral Management Interface (IPMI) software standard and signaling support.
Recognizing that failures are bound to occur, high-availability systems incorporate redundant resources for key hardware parts. To ensure continued service, those backup resources are immediately switched in when a failure occurs. Leveraging the high-availability capabilities of its extended CompactPCI hardware, Sun has built redundancy into CP2000 at multiple levels.
The most important features are the hot swappable I/O and controller cards. It is relatively simple to implement a traditional hardware-based hot swap technique, and existing sub-specifications define how to address the hardware issues involved. It is, however, much more challenging to create a mechanism for handling the failure of an I/O controller or a system controller. One possibility is to create a mirror system, make sure it is standing by, and then have everything fail over to it. But that is an expensive option.
A second option is to use a redundant controller. This can be complicated, since the system controller arbitrates the bus. It is important to establish how the redundant controller will take over all of the failed controller's operations without disrupting system operation.
Sun's solution builds on the CP2000's dual-bus architecture by providing a way to make applications hot swap-aware. This ensures that failures are handled seamlessly in software and that failed hardware can be simply "swapped out" while the system remains online. The mechanism is called alternate pathing.
With this method, disk and network operations can be automatically redirected to a predefined alternate path should a failure occur, thereby assuring that I/O cards can be serviced without any disruption to the systems. Each I/O device connects to two I/O controllers, and there are two separate electrical pathways to the I/O device. If one system controller fails, the application is informed and can direct the system to switch to the alternate controller. The key is that the switch occurs automatically whenever a path failure is detected.
Sun's architecture allows applications that are not hot swap-aware to take advantage of hot swap hardware by use of alternate-pathing technology. Applications can also register with the framework to be notified about specific events. Thus, it is possible to develop enhanced applications that are fully involved with any hot swap events. This is useful for capacity upgrades. For example, it is possible to add extra LAN cards to a running system and advise the application that they are ready for use.
Another important benefit of alternate pathing is that it supports dynamic reconfiguration, which allows the operating system to react to system hardware changes. The combination of alternate pathing and dynamic reconfiguration enables administrators to perform online repair and reconfiguration of servers, increasing application, or service-level, availability. The dynamic reconfiguration software allows the operating system to notify the application of a change in hardware resources.
The overall benefit of these extensions is the creation of a standards-based platform with the level of high availability that telcos demand.