While interest in high-availability systems has been around for a long time, the means to achieve them - memory and processing speed - are finally becoming available and affordable. High-availability real-time operating system (RTOS) solutions that enable uptimes of five 9s have always been required for mission critical applications such as telephone switches, avionics, nuclear power station control, emergency response systems, and defense applications. Now that high availability RTOS solutions can meet critical cost factors, they are finding their way into more consumer and networking applications.
The move towards using high-availability commercial RTOS solutions has been supported by the telecommunication and datacommunication industry's transition from mainframes to more compact systems like VME, PCI, and Com-pactPCI (CPCI). "In addition, intense time-to-market pressures meant less time for proprietary solutions, and manufacturers turned to their hardware and software suppliers for assistance," observes Bob Monkman, director of product marketing, Enea OSE Systems, Inc. (Taby, Sweden).
"Today, high availability is an application requirement for telephony system and networking solutions. We need to ensure that even if there is a hardware failure, there is a way to recover from it," observes Michael Tiemann, chief technical officer, Red Hat. His company's Linux high availability effort began about three years ago.
QNX Software Systems Ltd. (Kanata, Ontario) reports that its increase in demand for high availability is coming from the networking equipment and consumer appliance sectors. "Users now expect their data networks to have as much uptime, and to behave as predictably as traditional telephone networks," says Paul Leroux, technology analyst at QNX.
Since many large businesses now rely on data networks for all day-to-day operations and, in some cases, all their revenue, network downtime has become unaffordable. Additionally, networking software is evolving rapidly and requires frequent upgrades. So, the RTOS solution must enable upgrading software "on the fly," as well as tolerating software faults.
The demand for high availability is also moving into the consumer space, with consumers expecting their communication devices, such as PDAs, to be as reliable and maintenance-free as a telephone. "Since consumer appliance manufacturers can't address this demand with industrial-strength high-availability schemes, such as redundant CPUs, they are instead taking advantage of the memory management unit (MMU) protection of a high-availability RTOS," notes Leroux.
Building the RTOS
"One of the biggest challenges to building the high-availability RTOS is emerging hardware standards," observes Greg Rose, director of product management at LynuxWorks (San Jose, CA). It is important, therefore, for the OS vendors to form partnerships with hardware manufactures, especially as applications become more complex.
Many vendors in the RTOS market, such as Enea OSE and Lynux-Works, built their RTOS products from the ground up with high availability in mind. Some other vendors, such as WindRiver Systems, Inc. (Alameda, CA), introduced new products, such as the WindRiver VxWorks AE, to serve this market. It is clear that high availability must be designed into the RTOS, not layered on as an afterthought to be successful.
"It was high-availability systems that got us into the RTOS business," explains Dave Kleidermacher, engineering manager at Green Hills Software. "In 1996, we saw a need for OS that more fully take advantage of the capabilities of the latest 64- and 32-b microprocessors. At the time, there were no OSes available that would enable applications to achieve high availability." This was when the company began developing its Integrity product.
OnCore Systems (Half Moon Bay, CA) was founded on a similar sentiment. Phil Parker, the vice president of marketing for OnCore Systems notes, "Our founders all came out of the RTOS market and felt that previous, and most current, embedded OSes that could deliver real-time response were all designed for processors that were based on flat memory models and limited amounts of memory."
Achieving high availability
In today's systems, testing and debugging is not enough. Realistically, there will be bugs in the code, so designers need to build fault tolerance and high availability into the system.
"One of the most important factors to understand is that there is no single entity within the system that solely guarantees a level of availability. It is achieved only by a conscious combination of system software, specially designed hardware, system configuration choices, and the techniques employed by application software to leverage the underlying platform. Furthermore, there must be an overall system management strategy that drives the policies by which the whole network keeps running," explains Ted Hart-nell, product marketing manager, WindRiver.
With this in mind, the RTOS vendors offering high-availability products are providing as much assistance as possible for OEMs to achieve high availability. Some key considerations are memory protection, CPU time protection, hardware redundancy, hot-swapping capabilities for hardware switch outs, and overall system management of critical resources.
Achieving these goals is no small task. "Not only do high-availability techniques require considerable amount of work to employ, but the cost in terms of real-time responsiveness and practical impact on the system is still being evaluated as vendors are still largely de-signing early solutions," observes Monkman. He added that one of the biggest challenges is that most existing OSes were designed with little or no consideration for the issues surrounding high availability.
"These are designs that have no concept of multiple CPUs in the core kernel and are procedure-based models for making system calls," Monkman says. "It is considerably more difficult to achieve the required service transparency, service migration, and dynamic updates and reconfiguration that are key to high availability-enabled OS design when this model is employed."
The RTOS vendors are taking different routes to ensuring memory protection in their high-availability products. A traditional RTOS, one not optimized for high availability, typically uses a flat architecture, or one that places all of the software modules into the same address space in the OS kernel.
Because this RTOS scheme offers no memory protection, any malfunctioning module can overwrite memory used by the kernel, causing a fatal kernel fault. Another disadvantage of this configuration is the difficulty involved in upgrading the OS to a new version because modules running in the kernel address space cannot easily be stopped.
One approach is to separate the critical applications from the kernel space, but this still allows the lower priority modules, such as drivers and protocol stacks, to run unprotected in the kernel space.
Another approach is to run the modules as separate memory-protected apps. The QNX Microkernel implements a small set of services within the kernel itself (scheduling, IPC, and timers), while all other devices, including drivers, protocols, and file systems, run as separate applications.
"This boosts availability in several ways," explains Leroux, "first, the OS kernel contains very little code that could go wrong, and second, it is very difficult for any module - even a poorly written driver - to corrupt either the kernel or any other module. Faulty drivers and protocols can now be stopped and restarted, before they cause other services to fail." This separation approach also allows most modules to be upgraded dynamically, allowing the systems to continue running.
The engineers at Green Hills Software use the hardware MMU present in most 32- and 64-b microprocessors to isolate tasks from one another, so a malfunction in one task will not influence another, and a malfunction in a task will not impact the kernel. "Our goal is to isolate the failures so they do not impact the system. Otherwise, your system is only as reliable as your least debugged task," observes John Carbone, vice president of marketing. In Green Hills' approach, each application is assigned a quota of memory and a quota of CPU processing time for critical tasks. Because of this protection in the memory and time domain, Green Hills' Integrity product offers "guaranteed resource availability" because the "low criticality application, cannot, no matter what it does, even if it has bugs throughout it, affect another application in another address space," asserts Kleidermacher.
The OnCore RTOS product also works with the MMU, and its architecture is based on a virtual memory scheme which allows for separate environments for different applications. OnCore's virtual memory approach uses a memory mapper to dynamically allocate memory pages on the fly. According to Chip Downing, president and CEO, one advantage to this approach is more elegant software management. "Since every page of memory is dynamically mapped on demand, there is no wasted space that cannot be automatically recovered by normal operations," he says. "So, a fundamental cleanup will never be required."
Originally developed for NASA, the LynxOS from LynuxWorks uses POSIX software process model. Designers at LynuxWorks have focused on MMU protection, support of hot-swapping, as well as infrastructure support and management. The company plans to roll out high availability versions of its Blue Cat Linux product this year. A typical mixed-implementation could include BlueCat in the control unit and LynxOS in the I/O cards.
Engineers working with embedded Linux at MontaVista software and RedHat are focusing on the need to hot swap components in high-availability applications, which allows the replacement of CPCI boards for repair or upgrade without influencing the system's operation.
MontaVista's approach is to support systems using hot-swap, redundant system slots (a second board that is in a standby state), backplane messaging, and overall system management. "We realize that creating five-nines reliability is a system level issue," reports William Weinberg, director of product marketing at MontaVista Software, Inc. (Sunnyvale, CA).
Backplane messaging provides a means of communicating between cards in a CPCI system, utilizing the bandwidth of a PCI bus. Backplane communications can be used for both data transfer and management functions. Finally, a high availability system management product manages the overall logistics of high availability, such as providing an API for system management and check pointing.
High availability RTOS products include offerings from most RTOS vendors. For instance, the Integrity product from Green Hills Software was introduced in 1998. It was recently chosen by Boeing and Lockheed Martin for their avionics systems in a fly-off for the joint strike fighter aircraft program. QNX offers an high-availability RTOS architecture as well as its Qnet Net-working product, which is designed to enable network-distributed interprocess communication (IPC) and fault-tolerant networking, which automatically supports links between the processors.
OnCore released its first product in August 1999, and reports that it's shipping product to telecommunication and data communication OEMs with strict high-availability requirements. Designed for high-availability applications, the Enea OSE RTOS has been delivered to multiple OEMs for more than four years, mostly in the global communication market.
What about Linux?
While not truly a real-time OS, "em-bedded Linux is likely to sound the death knell for the traditional RTOS," observes Kleidermacher, "because it has given people an easy, generally more affordable alternative to the more expensive RTOS solutions that have been out there in the field." While embedded Linux can offer memory protection for high-availability applications, the code size has a tendency to be large. The product is not managed by a single vendor, which may prove to be an advantage or disadvantage.
What makes this issue somewhat clouded is the variance in what is considered "real time." One reasonable definition is: responding to external events at such a speed to assure that no information is missed, and that action is taken within the required time. Of course, the concept of required time varies by application. Thus, some applications may be able to use embedded solutions, such as Linux, for soft real-time applications that can get by with the speeds provided by the latest microprocessors.
Embedded Linux is suitable for applications that require response times measured on the order of seconds or microseconds per cycle as opposed to the nanoseconds and picoseconds per cycle scale of most real-time systems.
Perhaps Linux's greatest impact on the RTOS scene is its suggestion of a royalty free OS, which is influencing how other vendors determine their pricing structures.
The makers of high-availability-enabled hardware systems have largely focused the last few years on CPCI systems, which still have standards in development. Many of these OEMs use Linux to develop the conceptual software drivers for their prototype hardware designs. "While Linux will prove useful for software development by hardware vendors, the success of Linux in the market is largely expected to be limited to proof of concept," observes Hartnell.
While the industry is arguing whether most applications can use a "soft" real-time embedded OS, Linux is gaining acceptance in embedded applications. The software seems well suited for use as the overall system manager operating system. Last August, Red Hat announced that Motorola will be bundling Red Hat Linux 6.2 with its Advanced High-Availability Software for Linux, which will be shipped with Motorola's carrier-grade high-availability five-nines embedded computing platforms. According to Tiemann, "Motorola is looking to get the product to six nines and beyond."
Recently entering the market, MontaVista's first product offering in the high availability space was in January 2001, with the introduction of its embedded Linux High Availability services, including Hot Swap and Redundant System Slot, on Ziatech's ZT5083 high-availability solution. In May 2001, MontaVista will introduce additional high-availability embedded Linux offerings that will also support Force Computer, Motorola Computer Group, and Ziatech platforms.
It's clear that the use of commercial high-availability RTOS solutions and their development tools will allow infrastructure manufactures to shorten their time-to-market because of less need for debugging. This can give OEMs a notable competitive edge in a highly competitive marketplace.
P.O. Box 232 Nytorpsvagen
5B SE-183 Taby Sweden
Phone: 46 (0)8 507 140 00
Green Hills Software, Inc.
30 W Sola Street
Santa Barbara, CA 93101
390 S 400 W
Lindon, Utah 84042
2239 Samaritan Drive
San Jose, CA 95124
Mentor Graphics Corp.
8005 SW Boeckman Rd.
Wilsonville, OR 97070
Microware Systems Corp.
1500 NW 118th Street
Des Moines, IA 50325
MonteVista Software, Inc.
1237 E Arques Avenue
Sunnyvale, CA 94085
795 Main Street
Half Moon Bay, CA 94109
Moodie Drive, Suite 308
Nepean, Ontario, Canada K2H 9C4
QNX Software Systems Ltd.
175 Ternece Matthews Crescent
Kanata, Ontario Canada, K2M 1W8
2600 Meridian Pkwy.
Durham, NC 27713
US Software Corp.
7175 NW Evergreen Pkwy., Suite 100
Hillsboro, OR 97124
Five Cambridge Center
Cambridge, MA 02142
Wind River Systems
500 Wind River Way
Cambridge, MA 02142