This tutorial presents the author's practical experience with writing
Linux device drivers to control custom-designed hardware. The tutorial
starts by providing an overview of the driver writing process, and
describes several example drivers provided with this tutorial [4]. The
reader is encouraged to experiment with those example drivers on their
own x86 system, as it provides the best learning experience.
The ability of a user-space process to transfer data from multiple
PCI boards is contingent on the implementation of both the hardware and
driver. The requirements of both the hardware and software are
presented.
The drivers in this tutorial are written for the Linux 2.6 kernel.
The drivers have been built against; 2.6.9-11 (Centos 4.1), 2.6.13, and
2.6.14 for x86 and PowerPC targets. Details that are clearly described
in the book 'Linux Device Drivers' [1], by Corbet, Rubini, and
Kroah-Hartman are not repeated in this tutorial, so the reader is
encouraged to obtain a copy.

The Linux 2.6 kernel presents a number of generalized interfaces that
the driver writer must first understand, and then implement for their
specific driver. The best way to understand the interfaces is to write
simple drivers that exercise a subset of the kernel driver interfaces.
The following sections describe the interfaces used to implement
character device drivers.
Kernel modules
The file simple_module.c implements a very basic kernel module. A
device driver is a kernel module, but kernel modules are also used to
add features to the kernel that have nothing to do with device drivers.
Welcome to your first generalized kernel interface.
The basic requirements of a kernel module are that they implement an
initialization and an exit function. Those two functions are identified
by the macros module_init()and module_exit(). The example also shows
how to pass load-time parameters to the module, and how to setup
logging in a module.
The code sets up two logging macros; LOG_ERROR()and LOG_DEBUG(). The
debug macro can be removed from the code at compile time (by not
defining DEBUG), or can be compiled into the code and then enabled or
disabled via the load-time parameter simple_debug. This method of
adding log messages to code is easier to maintain (eg. disable) than a
series of printk()calls littered throughout the code.
The following shows the driver usage; the // marks are comments, while the $(user) and #(root) prompts show the commands
you enter (bashshell syntax).

So with the load-time parameter simple_debugset to zero, the
LOG_DEBUG()message does not appear in the output. The module load and
unload messages are generated using the LOG_ERROR()macro so that they
are always generated.
Device drivers
The file simple_driver.c implements a simple device driver. What makes
it a device driver, and not just a kernel module? In simple_init the
driver requests a range of major and minor numbers (the numbers used to
represent device nodes in /dev), it then allocates memory for an array
of device-specific simple_device_tstructures, and then registers the
character device, cdev, member of each structure in the array with the
kernel.
Registration of the character device requires a set of file
operations, i.e., a kernel-level implementation of the functions that
get called when user-space calls system calls, eg. open(), read(),
write(), ioctl(), lseek(), select(),and mmap(). The file operations are
stored as function-pointers in a structfile_operations; if this code
was written in C++, then this structure would be the base-class, and
your implementation of its functions would be an inherited class.
The file simple_driver_test.cis a user-space application that tests
the functions of the driver. Install the module, type ls/dev/simple*and
once you see device nodes there, run the test. After the test finishes,
type dmesg to see the kernel-level messages triggered by the user-space
test. Remove the driver, and reinstall it with load-time parameters,
eg.
#insmodsimple_driver.kosimple_device_count=3simple_minor_count=2
This creates three devices each responsible for three minor numbers
(functions on the device). ls-al/dev/simple* will show the multiple
devices created (and their major/minor numbers).
How did the device nodes magically appear in /dev?i Thats next.
Hotplug, sysfs, and udev
The simple driver initialization code, simple_init, also performs
another step, it creates a kernel object, class_simple or class
depending on the kernel version, that creates entries in the sys-file
system, sysfs, in the directory /sys/class. Creation of the class
object in the initialization code, creates the entry
/sys/class/simple_driver. Devices managed by the driver are then added
to the class object (see the code), creating the device nodes under
/sys/class/simple_driver, eg. if no load-time parameters are specified,
the driver creates one device, and the node
/sys/class/simple_driver/simple_a0 is created.
Why create these class and device 'objects'? The Linux 2.6 kernel
supports the concept of hot-pluggable devices, i.e., devices that can
be plugged in while the system is turned on, eg. a USB camera. In older
Linux systems, if you plugged in a camera, you'd have to look at the
output of dmesg to see what the camera was detected as (if at all), and
then try and figure out how to get images off the camera!
The Linux 2.6 system generates 'hot-plug' events every time a kernel
object is created and destroyed, and these hotplug events trigger the
execution of scripts in user-space. The (appropriately written) scripts
then automatically populate the /deventries for a device. A nice
feature of these scripts is that you can decide what name to give the
device, eg., a camera detected as a USB mass-storage device might be
detected as /dev/sda1in a non-hotplug system, but with hotplug you can
setup the camera name to be /dev/camera, much nicer!
The automatic creation of /dev entries relies on three related
kernel infrastructures; hotplug, sysfs,and udev. The man page, manudev,
gives details on how the scripts can be setup to create the /dev
entries with specific permissions, and how to map a kernel name (eg.
that used when the device was added to the class object in simple_init)
to a user-space defined name.
On Centos 4.1, the udevconfiguration files are kept in /etc/udev/,
the line udev_log=noin in /etc/udev/udev.conf can be changed to
udev_log=yes and hotplug events will be written to the system log. For
example, as root type tail-f/var/log/messages, and then from another
terminal install the simple_driver.ko, and you will see the logging of
the hotplug events.
The default name given to a single device created by the simple
driver is /dev/simple_a0. With no udevscripts in-place, the device node
is created for use by root only, and is named identically to the string
used in simple_init. The permissions on the device node can be changed
by creating a udevscript containing a single line:
#/etc/udev/permissions.d/20-simple.permissions
simple_*:dwh:mm:0660
This changes the permission on all nodes matching the pattern
simple_*to the owner dwh,group mm, with permissions 0660. The name of
the device entry can be changed, or a symbolic link to a device entry
can be created, by adding another script, eg. the following creates a
symbolic link to the first device entry
#/etc/udev/rules.d/20-simple.rules
KERNEL="simple_a0"SYMLINK="simple_00"
The udevman page gives more details on the options for device naming
(eg. a user-supplied program can be run to generate the device name).
The automatic creation of /dev entries helps reduce the contents of
/dev to just those devices installed. It also provides flexibility to
user-space in the naming of device nodes.
For example, in the case of PCI devices it allows the PCI location,
eg. bus:dev.fn to be remapped into a meaningful slot number, eg.
instead of say a device named /dev/board_00:0c.0, the user-space name
can be mapped to /dev/ board2.
The class_simple interface, as described in the Linux Device Drivers
book [1], was removed from the kernel (according to the ChangeLog for
that kernel), and the API changed again slightly. The parallel port
user-space driver, ppdev.c, is a nice small (easily understandable)
driver that uses the class interface. A diff of different kernel
versions of this driver can be used to determine the usage of any API
changes (eg. whether a new argument can be assigned NULL).
Kernel timers
The driver simple_timer.c implements a single device that uses two
different kernel mechanisms for delaying the calls read(), write(),and
select(). The test program simple_timer_test.c tests the driver. The
driver demonstrates the usage of timers and events.
Interrupts
The driver simple_irq.c implements a single device that uses the
parallel port on an x86 PC. To test this driver, you might need to
first remove the printer driver and parallel port driver, i.e.,
modprobe-rlp, modprobe-rparport_pc. The driver creates a kernel timer
that fires every second.
The timer handler writes a low and then high to all the data lines
on the parallel port. If a data line, one of pins 2 through 9, is
jumpered to the interrupt line, pin 10, then an IRQ will be generated
every second. The IRQ handler unblocks a blocked read(), write(),or
select().
If a data line is not jumpered to the IRQ line, then the blocked
calls will timeout (2s) and continue anyway. The test program
simple_irq_test.c tests the driver. The driver demonstrates the usage
of timers, IRQs, and events with timeouts.
Data buffering
The driver simple_buffer.c implements a single device that also uses
the parallel port on an x86 PC (so you will need to remove simple_irq
to test it). This driver is similar to simple_irq.c with the change
that IRQs write a time-stamp to an internal buffer, user-space
write()writes to that buffer, and read()reads from the buffer. The
following are some tests that can be performed using standard
command-line tools:
1) Connect the parallel port
IRQ to a data line. Install the driver named insmodsimple_buffer.ko.
Once the /dev/simplenode is valid, type cat/dev/simple. A UTC timestamp
will be printed every second.
2) Remove the parallel port
jumper. Remove the driver. Install the driver and disable the timer and
timeout as follows:
insmodsimple_buffer.kosimple_timer_enable=0simple_timeout_enable=0.
On one terminal type "cat/dev/simple", on another type echo
"Hello">/dev/simple". (You can also leave the timer enabled and it
will just write messages to the log file).
3) Combine the first two
tests (remove and re-install the driver without any load-time
parameters); the IRQ will add a complete timestamp message every
second, while write will add a complete string (whenever the user
triggers a write). No messages will be interrupted, since each
procedure locks the internal buffer.
The test shows that the driver works as one would expect, however,
take a look at the source for the details. The internal buffer is a
resource that is shared between read() (eg. one process), write() (eg.
another process), and the IRQ handler (interrupt context).
The driver uses a spin-lock to protect access to the buffer (and its
associated buffer count and pointers). Without this protection, an IRQ
could interrupt a write, and insert a timestamp into the middle of the
string echoed into the driver. Of course in a real driver, the results
could be more disastrous.
If the resource (buffer) being protected by the driver was only ever
accessed by processes, then a semaphore can be used to protect it.
Semaphores can be used to block a process, causing it to sleep while
waiting for a resource. Spin-locks are not quite so forgiving.
You are not allowed to sleep, or call a function that might sleep,
while holding a spin-lock. Make sure to build your driver development
kernel with CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP
enabled, and the kernel will give you a nice reminder if you try to do
something bad (eg. calling kmalloc while holding a lock).
The write() and read() operations of the driver need to copy data
from (or to) user-space to (or from) a kernel buffer. However, a
copy_from/to_user can sleep, so there is no way to copy directly to the
spin-lock protected buffer!
There's also the following write sequencing issue; to write data
into the buffer, you first need to check whether there is space.
However, the spin-lock needs to be held to check the buffer state, so
ideally you would hold the lock, check for space, release the lock, and
then copy a matching amount of user-data to the kernel. But, since you
are not holding the lock, an IRQ can come along and use up your space!
The solution, shown in the driver code, is to first copy all the
user data into a kernel buffer, and then hold the lock while checking
for space. This allows the (sleepable) copy and allocation calls to be
performed before holding the lock. Of course in the case of a full
buffer and non-blocking write, the allocation and copy from user-space
was a waste of time.
The code that holds the spin-lock, checks for a condition, and then
goes to sleep on a wait-queue if the condition is not met, should look
eerily familiar to anyone who has programmed with Pthreads; it is the
same pattern of code as used with a mutex and condition variable.
A mutex is used to protect a resource, while a condition variable is
used to put a thread to sleep while waiting for some other thread to
signal it that the condition has changed. The nice thing about this
analogy is that you can write pthreads code to simulate driver
buffering operations to 'figure it out' outside of the kernel.
The buffering used in the simple buffer driver is a bit contrived in
that there are two 'producers' writing to the buffer, and one
'consumer'. A more likely scenario for a driver would be to have a
buffer contended for by a single producer (say the receive IRQ), and a
single consumer (say read), and another separate buffer for a single
producer (write) and consumer (transmit IRQ).
But even in this situation, you can run into problems if the read
from the buffer takes an excessive amount of time, blocking new data
from the receive IRQ. One solution to this issue is to use two buffers
for each producer-consumer pair; eg. the receive IRQ is initialized to
point to an empty buffer, and receive IRQs fill the buffer until a read
is issued, at that point IRQ buffer is passed to read, and the IRQ gets
the second empty buffer.
Once read has consumed the contents of the first buffer, if the
second buffer in-use by the IRQ has new data, then the buffers are
swapped again. In this scheme, the lock only needs to be held to swap
the buffers, and since read does not hold the lock once it has a valid
buffer, a copy to user-space from the kernel buffer is allowed,
removing the need to use an intermediate buffer as shown in the simple
buffer driver. The kernel tty layer uses this form of buffering scheme
and refers to it as flip-buffering (see linux/tty.h).
The simple buffer driver has (at least) two practical applications.
If you install it and "cat" the timer generated time stamps into a
file, a plot of the difierence between consecutive time stamps minus 1
second, will show the error in the kernel's ability to generate a 1
second delay.
Running some testsbr>
In a test on an HP Omnibook 6100 PIII 1GHz laptop, the error was
approximately -130µs (i.e., slightly less than 1 second). The
test was started on a 1 second boundary, and over the space of 10
minutes, the timer was firing 100ms earlier than a 1 second boundary.
The second test determines how good NTP operates. Install the driver
with the timer and timeout disabled. Connect up the 1pps tick from your
NTP server's GPS unit to the parallel port interrupt of your PC, make
sure your PC NTP daemon is running, and catthe IRQ generated
timestamps.
The observed error of the measured timestamp relative to that same
timestamp rounded to the nearest second was about ±0.5ms. If the
test PC (laptop) had its ethernet cable disconnected, or the NTP daemon
was stopped, the error of the logged timestamps relative to the GPS
1pps tick would gradually increase (100 to 200µs over 10
minutes). If you had a method of generating a higher-frequency
square-wave that was also locked to GPS, then you could determine the
interrupt latency, and interrupt handling overhead, of the kernel by
hammering the IRQ pin at a few kilohertz.
A 'real-world' PCI driver
The experience presented in this document was gained during the
development of the Caltech-OVRO Broadband Reconfigurable Array (COBRA)
Correlator System. The hardware developed is documented at www.ovro.caltech.edu/~dwh/correlator.
The hardware is currently in use on several radio astronomy
projects, eg. the SZ Array (http://astro.uchicago.edu/sza/)
and the CARMA array (http://www.mmarray.org).
The cPCI digitizer and correlator boards used in the correlator system
contain a PLX9054 PCI interface, a Texas Instruments DSP, Altera
FLEX10K FPGAs, and on the digitizer, 1GHz analog-to-digital converters.
The digitizer output routes to the FPGAs on the digitizer board,
where data is digitally filtered, delayed, and routed to front-panel
high-speed connectors. The data travels over LVDS cabling (Ultra-SCSI
cables) to the correlator boards, where FPGAs cross-correlate and
average the data.
The on-board DSPs retrieve auto-and cross-correlation results from
the FPGAs, perform FFTs, further corrections, and average the data for
100ms to 500ms. Data is then transferred to a Linux host.
The system uses a GPS based NTP server with a 1pps output. The 1pps
signal is used to derive a hardware heartbeat, so that the 100ms and
500ms transfers are aligned with real-time. The Linux hosts run NTP
pointing to the NTP server, and check that data from boards arrives
within a 50ms window relative to a 100ms or 500ms boundary.
The Linux driver used in the COBRA system is shown graphically in Figure 1, below [2]. The driver
implements several character device interfaces to the board; a
terminal-like interface with standard-input, output, and error, a
read/write control interface, a read-only data interface, and a
read-only monitoring interface.
The reason for using multiple devices, rather than a complex scheme
of I/O control was determined by the usage of the driver. For example,
one objective was to enable the use of standard command line tools like
cat, od(octal dump), echo,and dd. These tools know nothing of I/O
control calls, so need to be directed to a device node of a specific
'personality'.
 |
| Figure
1: COBRA device driver block diagram. The block diagram shows the
relationship between the /devnodes accessed by user-space applications
and the files that implement the driver. |
The COBRA control system code controls up to 20 boards in a single
sub-system, and data must be collected from each board at about the
same time. The standard method for dealing with multiple sources of
data is to use the select() call, which uses file-descriptors. So by
separating out the data device and monitor device functionality at the
driver-level, a user-space server can run a thread containing a
select() call that collects all the data from all boards, and serves
that data up to clients. Then another thread, or another process even,
can run a monitor server containing a thread calling select() on all
the monitor file descriptors.
Dr. David Hawkins, Senior
Scientist at the California Institute
of Technology, is currently involved with the design and
development of high-speed digital correlator systems for Caltech, U.
Chicago, and the CARMA (Caltech,Berkeley, U. Illonois, and U. Maryland)
radio observatories.
This
article is excerpted from a paper of the same name presented at the
Embedded Systems Conference Silicon Valley 2006. Used with permission
of the Embedded Systems Conference. For more information, please visit www.embedded.com/esc/sv.
References
[1] J. Corbet, A. Rubini, and G. Kroah-Hartman. LinuxDeviceDrivers.
O'Reilly, 3nd edition, 2005.
[2] D. Hawkins. COBRA device driver. Caltech-OVRO documentation, 2004. (www.ovro.caltech.edu/fidwh/correlator/pdf/cobra
driver.pdf).
[3] D. Hawkins. PLX-9054 PCI Performance Tests. Caltech-OVRO
documentation, 2004. (www.ovro.caltech.edu/fidwh/correlator/pdf/pci
performance.pdf).
[4] D. Hawkins. Linux driver design source code. Caltech-OVRO
documentation, 2005. (www.ovro.caltech.edu/fidwh/correlator/software/driver
design.tar.gz).