Rumors of the demise of the venerable Unix operating system have been greatly exaggerated. In the last 18 months, Unix-type systems have enjoyed a renaissance in the embedded, desktop and server worlds as the platform of choice for mission- and business-critical applications. Much of the resurgence of Unix arises from the adoption of Linux and the success of the open-source movement.
A number of important technical factors conspire to make Unix-type systems attractive to today's systems designers: reliability and fault-avoidance afforded by Unix process-based memory protection; a modular, hierarchical application model; open "tinker toy" application building blocks; numerous options for distributed computing and high availability; standard networking built upon Internet Protocol; and broadly ported operating system and application code targeting off-the-shelf CPU architectures.
Both regulatory-dictated standards and customer expectations demand high software quality and at least the intention to ship bug-free code. Much of the appeal of the open-source movement arises in direct response to the cavalier attitude of desktop software vendors toward software quality, and the higher-quality software that is crafted on top of Linux and other open Unix platforms.
Most large programs originate with teams of software engineers working together rather than lone hackers working away into the night, even in the world of open source. Large projects, especially in telecommunications and networking, can involve scores of programmers generating tens of millions of lines of code.
While software engineering methodologies concentrate on managing such large projects, and the open-source bazaar imbues quality by leveraging numbers and focusing even greater numbers of "eyeballs" on program code, experience reveals that among engineers, competence will vary greatly. Moreover, even exceptional hackers are hard-pressed to produce perfect programs. In short: bugs happen.
Popular notions of software bugs usually focus on program logic miscalculations, overwritten data, missing steps, etc. Among the hardest bugs to find, however, are the ones that violate the structure of programs and their execution environments. Operating system run-time constructs and programming practices slice available memory up into regions containing program code and different types of data.
When those divisions are soft, enforced only by how tools lay out memory and by how programmers follow the layout discipline, applications are vulnerable to bugs wherein a program itself violates the boundaries among those regions. Such runaway programs can unintentionally modify their own code, accidentally corrupt key data structures, or modify the code and data regions of other programs, including the underlying operating system. Worse yet, such violations are often silent-the modified code or data may not be exercised for hours, days, or even years, distancing the cause and the effect of software faults in time and space.
The Unix programming architecture provides formidable protection against such structural bugs by leveraging the memory management unit (MMU) available on modern 32-bit and 64-bit CPUs. Whereas many simple programming executives, especially embedded operating systems, bypass the MMU or only enable it as part of cache memory configuration, Unix embraces the MMU and utilizes it to the fullest to enhance software quality.
The Unix programming model starts by giving every program (process) the illusion of "owning" the entire CPU address space (including the Unix kernel itself). In the case of CPUs like the Intel Pentium and Motorola PowerPC, a process exists in a 4-Gbyte virtual universe, without knowledge of how and where other processes' code is laid out. Each process receives an equivalent virtual address space and complete isolation and protection from other processes in the system; programs cannot accidentally modify what they cannot "see."
Next, program code and constant data is protected against accidental overwriting. Should a stray pointer or out-of-control loop walk over those read-only regions, the fault is detected and the program halted, optionally producing a snapshot (core) of the violation for later analysis and debugging.
On processors that support a privileged mode of operation, only system code (the OS and device drivers) run in that mode, relegating application code to a non-privileged status to reduce further the access of potential errant code to system-critical resources. Such protection not only inhibits accidental violations, but also serves to protect against malicious programs like viruses and Trojan horses.
Lastly, some or all of the Unix kernel can occupy some part of each process space, making system calls and data sharing more efficient, but still be protected against accidental illegal accesses through use of the privileged/non-privileged dichotomy.
The Unix process model regards system memory as a mostly uniform pool of available blocks or pages. The operating system constructs the illusion of contiguous process address spaces by translating or mapping physical memory addresses into the logical process address spaces where programs run and store their data. Physical-to-logical address translation occurs with every memory access, passing through the MMU to arrive on the system bus. The MMU hardware detects if an address lies within the bounds of a known logical address space, and whether the memory operation (write or read) is permitted for the logical address in question. If the address has been mapped, the translation offset is added to the address in question and the memory access cycle proceeds. If not, or if the operation is not permitted (e.g., writing into a read-only space), then the MMU interrupts the processor with an address exception.
Two strategies have traditionally been employed for virtual addressing: segmented and paged. Segmented addressing divides memory into fairly gross chunks, usually collecting a program's code and constants into one segment, and data into one or more additional segments. While easier to implement, segmented memory management suffers from its low-resolution approach. As a system runs, and programs and data come and go, unusable dead spaces accumulate, consuming available memory and preventing new programs from loading.
Paged memory management is more efficient and also more complex than segmented models. As with segmented schemes, the MMU intervenes between the CPU and system memory, translating and filtering. The granularity of translation, however, instead of being an arbitrarily large block of code or data, is a manageable 4 kbytes.
On 32-bit machines, translation proceeds for a typical 4-kbyte page size as follows: The low 12 address bits represent an offset into the page in question (12 bits represent 212 addresses, or 4,096-4k). The high 20 bits divide into two groups of 10 bits. The most significant 10 bits address up to 1k (1,024) page tables; the least significant 10 bits address up to 1,024 entries in each page table, for a total of 4 Gbytes.
A version of this translation scheme exists for each process in a system, and when the Unix kernel switches from one process context to another, it plugs in an appropriate set of translations (page tables) for that process.
By apportioning memory in 4-kbyte chunks, a paged scheme's higher granularity limits the size of potential dead spaces in memory and results in higher utilization of this valuable resource.
Not only do Unix systems protect and make efficient use of system memory, they allocate and recover memory from applications more reliably, limiting and in some cases eliminating the memory leaks that limit the up-time of many desktop systems.
Unix implements a hierarchical process creation model, wherein new processes are created as clones or children of existing processes. This cloning, or forking, allows running programs to scale to meet loading needs and is the method by which the OS runs new programs.
To ensure system integrity, Unix forking provides some checks and balances on pro-cess creation and deletion. When a new process is created, its parent, either another application or the system itself, can keep tabs on the new entity, checking on its status and health. Should the child process terminate prematurely, the parent is signaled and can spawn another copy in its place. Should the child process hang or become unstable, the parent can forcibly terminate the errant program and proceed as required.
In any case, when a program terminates, its memory and other system objects (open files, queues, etc.) are recovered by the OS, enhancing uptime by conserving resources.
Some Unix implementations, including Linux, extend this dynamic model to overall configuration, allowing for not only the termination and restarting of user-level applications, but also the dynamic installation and removal of operating system kernel modules and device drivers, all without rebooting.
This neat division of programs into restartable processes gives rise to a very modular style of software design. While object-oriented programming preaches aggressive code reuse, Unix systems have been practicing it through the development and use of small, well-understood, reliable, single-purpose program modules. These modules run at a process level and can be plugged together with standard interprocess communication mechanisms, like pipes and queues.
To foster interoperability among these modules and to make modular programming easier, Unix implementations have come to rely upon standard application programming interfaces (APIs). These APIs are embodied in the popular System V and Berkeley versions, codified by the Posix (IEEE 1003) family of standards, and readily found in proprietary and open-source Unix systems like Linux.
Traditional fault-tolerant (2N and 3N) systems employ passive redundant components to ensure extended up-time (availability): Multiple CPUs run the same code, operating in lock-step, such that should any one of the redundant systems fail, at least one backup is ready to takes its place, almost instantaneously. While offering fault resistance, these systems have the disadvantages of being expensive and slow, with limited or no upgrade paths due to special-purpose hardware.
They also underutilize the available hardware since redundant components only replicate their counterparts. Moreover, such systems generally limit their scope of fault isolation, regardless of whether the fault lies in hardware or software, to the granularity of the entire machine.
Instead of duplicating identical resources (2N/3N) to improve availability, enhanced up-time (MTBF) and reduced time-to-repair can be achieved through distributing functionality across a network of heterogeneous CPUs. Spare hardware, and indeed spare software, can reside locally, within the bounds of a single system, or on multiple systems in a network (N+1 or N+M redundancy).
As failures are detected, failover policies can dictate the restart/ reinstall of an application or driver wherever in the distributed system known-functional hardware and software resources reside.
Resource utilization and available bandwidth are higher in such a system, because spares, rather than running idle, can operate at 50 percent or greater utilization at all times.