In the past years, Linux has become an increasingly popular operating
system choice not only in the PC and Server market, also in the
development of embedded devices - particularly consumer products,
telecommunications routers and switches, Internet appliances, and
industrial and automotive applications.
The advantage of Embedded Linux is that it is a royalty-free, open
source, compact solution that provides a strong foundation for an
ever-growing base of applications to run on. Linux is a fully
functional operating system (OS), with support for a variety of network
and file-handling protocols - a very important requirement in embedded
systems because of the need to "connect and compute anywhere at
anytime."
Modular in nature, Linux is easy to slim down by removing utility
programs, tools, and other system services that are not needed in the
targeted embedded environment. The advantages for companies using Linux
in embedded markets are faster time to market, flexibility and
reliability.

For those developers, the combination of converged architectures such
as the Blackfin Processor and uClinux may be of particular interest.
Blackfin processors [1] combine the DSP computing power and the
functionality of microcontrollers, fulfilling the requirements of
digital audio and communication applications. The combination of a DSP
core with traditional microcontroller architecture on a single chip
avoids the restrictions, complexity, and higher costs of traditional
heterogeneous multiprocessor systems.
All Blackfin Processors combine a state-of-the-art signal processing
engine with the advantages of a clean, orthogonal RISC-like
microprocessor instruction set and Single-Instruction Multiple-Data
(SIMD) multimedia capabilities into a single instruction set
architecture. The Micro Signal Architecture (MSA) core is a dual-MAC
(Multiply Accumulator Unit) modified Harvard Architecture that has been
designed to have unparalleled performance on typical signal processing
algorithms, as well as standard program flow and arbitrary bit
manipulation operations mainly used by an OS. Both MACs can be used in
the same operation and single cycle to double the MAC throughput, such
as, for example the dual MAC Blackfin assembly instruction below:
R3 = (A1 += R7.H * R6.H), R2 = (A0 += R7.L * R6.L);
As shown in Figure 1 below, the single core Blackfin Processors have
two large blocks of on-chip memory providing high-bandwidth access to
the core. These memory blocks are accessed at full processor core speed
(up to 756MHz). The two memory blocks sitting next to the core,
referred to as L1 memory, can be configured either as data or
instruction SRAM or cache.
When configured as cache, the speed of executing external code from
SDRAM is nearly on par with running the code from internal memory. This
feature is especially well suited for running the uClinux kernel, which
doesn't fit into internal memory. Also, when programming in C, the
memory access optimization can be left up to the core by using
cache.
 |
| Figure
1: Single core Blackfin processor |
There are a countless number of commercial and non-commercial Linux
kernel trees and distributions. One of the special trees is the uClinux
kernel tree, at www.uclinux.org [2]. This is a port of the Linux kernel
designed for hardware without a Memory Management Unit (MMU).
While the uClinux kernel patch has been included in the official
Linux 2.6.x kernel [3], the most up-to-date development activity and
projects can be found at uClinux Project Page [2] and Blackfin/uClinux
Project Page [4] (www.blackfin.uclinux.org). Patches such as these are
used by commercial Linux vendors in conjunction with their additional
enhancements, development tools and documentation to provide their
customers an easy-to-use development environment for rapidly creating
powerful applications on uClinux.
Additionally, www.uclinux.org provides developers with a uClinux
distribution that includes three different kernels (2.0.x, 2.4.x,
2.6.x) along with required libraries; basic Linux shells and tools; and
a wide range of additional programs such as web server, audio player,
programming languages, and a graphical configuration tool. There are
also programs specially designed with size and efficiency as their
primary considerations.
One example is busybox [5], a multicall binary, which is a program
that includes the functionality of a lot of smaller programs and acts
like any one of them if it is called by the appropriate name. If
busybox is linked to ls and contains the ls code, it
acts like the ls command.
The benefit of this is that busybox saves some overhead for unique
binaries, and those small modules can share common code. In general,
the uClinux distribution is more than adequate enough to compile a
Linux image for a communication device, like a router, without writing
a single line of code.
Despite the fact that Linux
was not originally designed for use in embedded systems, it has found
its way into a lot of embedded devices. Since the release of kernel
version 2.0.x and the appearance of commercial support for Linux on
embedded processors, there has been a real explosion of new embedded
devices that feature the OS.
Almost every day there seems to be a new device or gadget that uses
Linux as its operating system, in most cases going completely unnoticed
by the end users. Today a large number of the available broadband
routers, firewalls, access points, and even some DVD players utilize
Linux, for more examples see Linux devices [6]. uClinux same as Linux
offer a huge amount of drivers for all sorts of hardware and protocols.
Combine that with the fact that Linux does not have run-time royalties,
and it quickly becomes clear why there are so many developers using
Linux for their devices.
Linux on a DSP-like processor
In the past, DSPs have been used in a lot of applications, including
sound cards, modems, telecommunication devices, medical devices, and
all sorts of military and other appliances that perform pure signal
processing. Those DSP systems were generally designed specifically for
those applications and had only basic capabilities in order to meet
their tight cost and size constraints.
As DSPs have become more powerful and flexible, thereby servicing
the more advanced requirements of military, medical, and communication
users, they still have lacked the proper capabilities to run advanced
operating systems. Those traditional DSPs are very powerful and
flexible, but can be rather expensive.
They are often found clustered on special signal processing hardware
where there is no need to have an operating system like Linux running
on the DSP itself. This is generally due to the fact that in those
systems the DSP gets its data from some type of additional central
processing unit. Therefore only "basic" system software had to be
written for such DSPs.
With the quickly advancing multimedia convergence and the
proliferation of multimedia and communication-enabled gadgets, there is
now a big market for a new type of DSP. In the past, the most widely
used design for servicing these markets is the combination of a
general-purpose processor and a traditional DSP serving as a
coprocessor. In this scenario, the operating system runs on the host
processor, and the signal processing is done on the DSP. This type of
dual-processor design is suboptimal due to inefficiencies incurred in
maintainability, cost, power, and size. A different approach could be,
the redesign of the traditional DSP to fit the demand of an advanced
operating system while preserving the advanced DSP architecture.
This approach has been taken by the Blackfin Processor designers—by
designing a processor with advanced DSP features around the well-proven
Harvard Architecture with a RISC-like orthogonal enhanced instruction
set. Also featuring advanced addressing, stack control and privileged
operation modes. Such a device is no longer a simple DSP, but rather a
powerful processor that will meet the intensive demands of a wide range
of industrial, communication and multimedia applications.
Combined with the capabilities and the power of an operating system
like Linux, there are endless possibilities. Nevertheless on the
General Purpose Processor side vendors are not sleeping an in turn
designing their new processors to compete in the same market. So it
comes down to the point " where for processors it's just the 5 P's rule
: price, performance, power consumption, peripherals, and penguins.
Differences between Linux and
uClinux
Since Linux and uClinux is similar to UNIX in that it is a multiuser,
multitasking OS, the kernel has to take special precautions to assure
the proper and safe operation of up to thousands of processes from
different users on the same system at once. The UNIX security model,
after which Linux is designed, protects every process in its own
environment with its own private address space. Every process is also
protected from processes being invoked by different users.
Additionally, a Virtual Memory (VM) system has additional
requirements that the Memory Management Unit (MMU) must handle, like
dynamic allocation of memory and mapping of arbitrary memory regions
into the private process memory.
Some processors, like Blackfin, do not provide a full-fledged MMU.
These processors are more power efficient and significantly cheaper
than the alternatives, while sometimes having higher performance. Even
on processors featuring Virtual Memory, some system developers target
their application to run on uClinux, because uClinux can be
significantly faster than Linux on the same processor. MMU operation
can represent a significant time overheard.
Even when a MMU is available, it is sometimes not used in systems
with high real-time constraints. Context switching and Inter Process
Communication (IPC) can also be several times faster on uClinux. A
benchmark on an ARM 9 processor, done by H.S. Choi and H.C. Yun, has
proven this [7].
To support Linux on these MMU-less devices, a few trade-offs have to
be made:
1. No real memory protection
(a faulty process can bring the complete system down)
2. No fork system call
3. Only simple memory
allocation
4. Some other minor differences
Memory protection is not a real problem for most embedded devices.
Linux is a very stable platform, particularly in embedded devices,
where software crashes are rarely observed. Even on a MMU based system
running Linux, software bugs in the kernel space can crash the whole
system. Since Blackfin has memory protection, but not Virtual Memory,
Blackfin/uClinux has better protection than other no-MMU systems, and
will not crash as "often" as uClinux running on different processors.
There are two most common principal reasons causing uClinux to crash
- stack overflow and null pointer reference.
Stack overflow
When Linux is running on an architecture where a full MMU exists, the
MMU provides Linux programs basically unlimited stack and heap space.
This is done by the virtualization of physical memory. However most
embedded Linux systems will have a fixed amount of SDRAM, and no SWAP "
so it is not really "unlimited".
A program with a memory leak can still crash the entire system on
embedded Linux with MMU. Because uClinux can't support VM, it allocates
stack space during compile time at the end of the data for the
executable. If the stack grows too large on uClinux, it will overwrite
the static data and code areas. This means that the developer, who
previously was oblivious to stack usage within the application, must
now be aware of the stack requirements.
On Blackfin/uClinux - there is a compiler option to enable stack
checking. If the option fstack-limit-symbol=_stack_start
is set, the compiler will
add in extra code, which checks to ensure that the stack is not
exceeded. This will ensure that random crashes due to stack
corruption/overflow will not happen on Blackfin/uClinux. Once a
application compiled with this option and exceeding it's stack limit,
gracefully dies. The developer then can increase the stack size at
compile time or with the flthdr utility program during runtime. On
production systems, stack checking can either be removed (increase
performance/reduce code size), or left in for the increase in
robustness.
Null pointer reference
The Blackfin MMU does provide partial memory protection, and can
segment user space from kernel (supervisor) space. On Blackfin/uClinux,
the first 4k of memory starting at NULL is reserved as a buffer for bad
pointer dereferences. If an application uses a uninitialized pointer
that reads or writes into the first 4k of memory, the application will
halt. This will ensure that random crashes due to uninitialized
pointers are less likely to happen. Other implementations of uClinux
will start writing over the kernel.
The second point can be little more problematic. In software written
for UNIX or Linux, developers sometimes use the fork system call when
they want to do things in parallel. The fork() call
makes an exact copy of the original process and executes it
simultaneously. To do that efficiently, it uses the MMU to map the
memory from the parent process to the child and copies only those
memory parts to that child it writes.
Therefore, uClinux cannot provide the fork() system
call. It does however provide vfork(), a
special version of fork(),
in which the parent is halted while the child executes. Therefore,
software that uses the fork() system
call has to be modified to use either vfork() or POSIX
threads that uClinux supports, because they share the same memory
space, including the stack.
As for point number three, there usually is no problem with the malloc support
uClinux provides, but sometimes minor modifications may have to be
made. Memory allocation on uClinux can be very fast, but on the other
hand a process can allocate all available memory. Since memory can be
only allocated in contiguous chunks, memory fragmentation can be
sometimes an issue.
Most of the software available for Linux or UNIX (a collection of
software can be found on http://freshmeat.net) can be directly compiled on
uClinux. For the rest there is usually only some minor porting or
tweaking to do. There are only very few applications that do not work
on uClinux, with most of those being irrelevant for embedded
applications.
In Part 2 in this three part series the
author surveys the development tools, environments and libraries
available for DSP-oriented applications including VoIP, audio
compression, and image capture and processing, the ways to most
effectively use them and how to avoid problems.
Since
obtaining his MSc (Computer Based Engineering) and Dipl-Ing.(FH)
(Electronics and Information Technologies) Degree from the Reutlingen
University , Michael Hennerich has worked as a design engineer on a
variety of DSP based applications. Michael now works as a DSP
Applications and Systems Engineer at Analog
Devices Inc. in Munich, Germany.
This article is excerpted from a
paper of the same name presented at the Embedded Systems Conference
Silicon Valley 2006. Used with permission of the Embedded Systems
Conference. For more information, please visit www.embedded.com/esc/sv.