The past decade has seen a shift from large, pc board-based systems in which computing is done in one location to an environment of many distributed small-footprint systems and subsystems, each of which in many cases fits on a tiny piece of silicon. At the same time, applications are becoming more distributed and more complex, requiring more support from the underlying multitasking environment.
During this period, while software companies have developed or adapted real-time operating systems to handle these new kinds of distributed applications, RTOSes still reflect their heritage in traditional, board-based systems where device footprints were not a big deal. But how well do these RTOSes fit into small embedded systems such as the SoCs used in consumer devices, wireless communications and computing devices and within the embedded-networking infrastructure, especially as it relates to multitasking?
Most embedded systems in to day's connected environmen will require a high degree of multitasking capability. In a typical application,four to five tasks or more run at the same time. At regular intervals, a clock-tick interrupt hits the CPU. During the interrupt the operating system performs operations to decide which task to run. Even if a task should continue to run for more than one period, the CPU will be interrupted by the clock-tick interrupt to let the operating system perform its operations. This will, of course, eat valuable CPU time.
But there is a more effective way to handle such multitasking: move some of the operations that the operating system software performs to hardware, just as math operations are commonly moved from software to hardware as the performance of the system requires. And this shift does not require that the system developer modify any of the techniques or change any of the tools that are normally used in building systems that require multitasking. Indeed, it often significantly simplifies the programming requirements.
The same four or five tasks will still operate at the same time, but now without the clock-tick interrupt and sophisticated interrupt structure this requires in most software-based operating systems. The clock-tick interrupt is replaced with a task-switch interrupt that only will interrupt the CPU when a running task is to be swapped out and replaced with another task.
The task-switch interrupt is actually the only required CPU interrupt in a system where parts of the operating system are implemented in hardware. In addition, in such a hardware-based OS, interrupts can be handled without interrupting the CPU if it is performing operations that are of a higher priority than an incoming interrupt.
There are some operations that are done in an operating system but that are independent of the characteristics of the particular implementation. Regardless of the architecture or application, there is a common set of kernel functions that could be shifted into hardware and accelerated without any substantial perturbations to the surrounding software applications layer. Such OS-independent operations include task handling (creation, deletion and scheduling of tasks), synchronization (semaphores and flags) and timers (delay, periodic start, watchdogs and interrupt).
By excluding the clock-tick interrupt, the shift to hardware can make up to 20 percent more CPU time available to the application layer since the time spent in the operating system kernel is excluded. Inside an OS kernel, a lot of searching and sorting of queues often takes place during the clock-tick interrupt. So replacing parts of the kernel with a hardware implementation of the OS can gain valuable CPU time. The higher the clock-tick interrupt frequency, the more the system will gain by putting OS functions in hardware.
Another important factor is that a hardware kernel can have much better granularity than a software kernel. Best case, a software-based kernel can handle clock-tick interrupts down to 1 ms; below that figure, the overhead becomes too big. In a hardware kernel, by contrast, the internal timers can be in the microsecond domain without any negative effects on the rest of the system.
Faster response time
But exclusion of the clock-tick interrupts is not the only way in which a hardware RTOS kernel helps system performance. In addition, the system will gain performance because the response time for system calls is much shorter when functions are moved to hardware often 10 to 25 times faster than software solutions in single-CPU systems. The only needed software when using a hardware kernel is a driver for communication between it and the CPU. The resultant overhead footprint required for the OS kernel operations is minimal usually no bigger than that required for the driver.
The response time for interrupts must be kept short in all kinds of systems to obtain successful interaction with the external environment. Normally, external interrupts are connected to the CPU through some kind of interrupt controller chip, which means that when an external interrupt occurs, the task that is running will be interrupted. That affects the predictability of the system, since an interrupt service routine (ISR) is able to interrupt a scheduled task.
Using an intelligent interrupt handler implemented in a hardware-based kernel improves system performance and behavior, since each ISR can be treated as a normal task and scheduled according to given priority. And from a programmer's point of view, the handling of an external interrupt is almost the same as for an event flag, except that all external interrupts are physically routed to the hardware kernel.
We have done kernel implementations of an RTOS in a variety of multiprocessor applications and have found that the performance gain is even higher than the 10x to 25x estimate for single-CPU systems. According to experiments we have done, the speedup can be as high as 1,000 times compared with software kernels for certain operations.
It turns out that synchronization, a critical element in most multiprocessor designs, is more efficient when in done in hardware. For example, spinlocks, which are widely used to protect shared memory and other areas in multiprocessor systems, can be excluded. Rather than have the CPU execute on a spinlock, it can execute task code and let the hardware kernel handle the synchronization.
Moreover, one hardware kernel can serve several CPUs, not only in homogeneous environments such as symmetric multiprocessor systems but also in the more heterogeneous environments of many embedded networking environments.
Also, by implementing all task handling in hardware, it is possible to monitor a system at a system level, without insertion of software probes in the code.
In our work, we developed an Internet Protocol-based module that can be connected to the hardware kernel and monitor all internal parts of it. The data is sent to a dedicated port that is not connected to the CPU and then to a database on a separate computer on the network. The events stored in the database can be searched and presented on a graphical display. It is also possible to filter and select desired events to be sent to the database.
We have found that to implement a hardware RTOS that is independent of the architecture and that can be used with virtually any CPU/DSP, all that is necessary is that the interface to the hardware kernel be split into two parts: a technology-dependent bus-specific interface (and a generic bus interface.
The footprints for a hardware kernel and software driver vary depending on the complexity needed. In small embedded systems, the hardware kernel is typical about 15k to 20k ASIC gates and the software driver about 2 kbytes of code.