The success of Linux as a desktop operating system, and the appeal of its open source technology, might provoke one to dream of applying it to embedded systems. After all, embedded systems run on the same sort of 32-bit microprocessors that are used in desktop systems, they require an operating system just as desktop systems do, and wouldn't it be great to avoid those nasty royalty-payments? As a bonus, it's free. So, with all that going for it, why aren't all embedded systems developers jumping onto the Linux bandwagon?
The reason lies in the significant differences between desktop systems and embedded systems. These differences include interrupt latency, thread response time, scheduling, device drivers, and memory footprint
Along with context switch time, interrupt latency is the most often analyzed and benchmarked measurement for embedded real-time systems. Operating system architecture is the most significant factor for determining interrupt latency and thread response times in an embedded system. Interrupt latency is defined as the elapsed time between an interrupt and the execution of the first instruction in the corresponding interrupt service routine. Interrupts are typically prioritized (by hardware) and nest; therefore, the latency of the highest priority interrupt is usually examined. By definition, a real-time system is one in which an event (interrupt) must be handled within a bounded (and typically short) amount of time; failure to respond causes a failure in the embedded system.
How can software increase interrupt latency? By actually deferring interrupt processing during certain "critical" operating system operations. The operating system accomplishes this by "disabling" interrupts while it performs such critical sequences of instructions. The major component of worst case interrupt latency is the number and length (in terms of time to execute those instructions) of instructions for which the operating system disables interrupts.
If an interrupt occurs during a period of time in which the software has disabled interrupts, the interrupt will remain pending until software re-enables interrupts. The length of time for which interrupts are disabled provides a worst-case upper bound for interrupt latency.
The importance of understanding the worst case interrupt disabling sequence must not be understated. A real-time system utterly depends upon guaranteeing that the critical events in the system are handled within the required time frame. The reality, however, is that operating system vendors generally publish an average, typical, or best case interrupt latency, measured in a lab environment. Is it possible to statically compute the worst case disabling region? A team of researchers recently attempted to discover the answer to this question for one commercial real-time operating system in use today.
The case study employed some advanced methods of program flow analysis in an attempt to determine the location and structure of all the interrupt disabling regions. The researchers used cycle accurate models to determine execution counts of the selected regions. The case study took five months to complete.
The results are not encouraging. Because the instruction used to disable and enable interrupts uses a register value as its source, it was often impossible to statically determine whether a given instruction enabled or disabled interrupts. Other problems of program flow, such as nested disabling/enabling sequences, hampered the study. In the end, after five months, the research team estimated that only approximately half of the disabling regions were identified. How many is that? Six hundred and twelve regions. In other words, another six hundred regions lurk in the system with an unknown impact on worst case response time.
In addition, the researchers estimated the execution time of identified regions. Some of these regions had calls to out of line functions. A few regions even had triply nested loops. And some loops found in critical regions were of variable bound. The cycle count estimate for one of these nested loop regions was 26,729. On a 100 MHz microprocessor, that would translate into approximately 250 microseconds just for that one region.
Rest assured that no real-time operating system vendor would claim an interrupt latency measurement of this magnitude. In desktop applications, where real-time reaction is not required, worst case interrupt latency is irrelevant. Linux was designed for desktop use, not for real-time performance, so we can expect the results of a similar experiment on it to be much more frightening.
Measuring real-time
Thread response time is defined as the elapsed time from interrupt to execution of the first instruction in a thread awakened to service the interrupt. This is also an important measurement in real-time systems since designers would prefer to place device manipulation code in threads where it is often easier to debug.
A significant problem involves the interaction between the thread responding to the high priority interrupt and other lower priority interrupts. Since interrupts are enabled while the high priority thread is executing, an unbounded number of low priority interrupts can occur, increasing the thread response time as each interrupt service routine is executed. This is commonly called "priority-inversion," since low priority threads can delay the execution of higher priority threads. In desktop systems, such inversions are unimportant, since delays are usually not noticeable by the human user, and in a worst-case lock-up, one can simply re-boot.
A real-time operating system, though, should provide a method of preventing this kind of priority inversion. One solution is to enable device driver designers to prioritize interrupts below critical interrupt handling threads. When a thread is scheduled, the kernel disables interrupts that are assigned a lower priority than the thread. When there is no higher priority thread to run, the lower priority interrupts are re-enabled; a simple yet effective solution.
Linux has no such provision for prioritizing threads relative to interrupts; however, the actual situation is much worse. Real-time operating systems provide priority-based scheduling because it must be possible to guarantee that the most critical threads in the system can run immediately in response to an event. It is forbidden to use heuristics or any other constructs in the kernel that might make this response nondeterministic.
The Linux scheduler is a fairness based heuristic scheduler. This comes from Linux's UNIX heritage as a time sharing, interactive operating system. Thus, it is not possible for the designer to specify an absolute "highest" priority thread. When an interrupt handler makes a thread ready to run in order to process the event, the Linux scheduler is quite likely to choose some other thread to run first. It simply isn't possible to determine the worst case thread response time.
In addition to long interrupt latency, Linux disables preemption or dispatching for very long periods of time. Empirical testing of standard Linux kernels yields response times in the many-millisecond to second range, clearly not close to the requirements of real-time systems. A kernel patch is now available for Linux which uses Linux's SMP hooks to add preemption points, thereby reducing thread response time. The measured response times, however, are still orders of magnitude (hundreds of microseconds) higher than that achieved by real-time operating systems. And, the measured times do not reflect the actual worst case times. No one truly has any idea what the worst case response times are for Linux. Clearly, this is not an acceptable solution for real-time systems.
Device driver code adds less risk to the system if it runs in its own protected address space. An operating system architecture designed to facilitate virtual device drivers is preferred to the traditional method of requiring device drivers to run in physical memory along with the kernel. This requires a flexible yet powerful and efficient API for providing the virtual device driver with secure access to the physical device resources it requires.
Linux lacks this architecture. In fact, despite offering protected virtual memory for applications, Linux promotes the addition of complicated device drivers into the physical kernel address space, where it adds the most risk. The traditional method of adding a new Linux device driver is to compile the driver code into object files and then either link them into the kernel or load them dynamically via the insmod kernel module loader. These drivers have unfettered access to physical memory and are difficult to debug.
Footprint issues
A major problem with embedding Linux is its large footprint. Linux developers say that two megabytes is the minimum "useful" size of Linux. Microsoft claimed that "Lineo cites a minimum footprint size of 2 Mbytes ROM..." and "Red Hat, for its new version of embedded Linux, recommends 8 Mbytes RAM and 4 Mbytes Flash as minimum system requirements". For some types of embedded systems (such as aircraft avionics systems), footprint may not be a significant concern since the cost of the memory is very small relative to the overall embedded system. Desktop systems typically have ample memory to devote a few megabytes to the operating system. Most embedded systems, however, are memory constrained.
Ironically, significant interest in embedding Linux seems to exist in high volume consumer electronics products, such as set top boxes, PDAs and DSL modems. High volume consumer devices usually have thin profit margins, where the cost of an extra few megabytes of Flash might mean the difference between profitability and loss. In fact, we have talked to customers who began designs with embedded Linux only to have thrown it away when it became apparent the memory requirements were too high, making achievement of a cost-effective solution unlikely.
In effect, the extra memory requirement is equivalent to a royalty, since every single unit of the embedded system product must contain the extra cost memory. In fact, the difference between a 4 Mbyte and an 8 Mbyte Flash memory configuration in an embedded system will be as much as $10. Few proprietary RTOS companies charge as much as this for their operating system run-time royalties.