The ability to store and manage large amounts of data has become a critical requirement for a variety of embedded devices. Car infotainment units, industrial automation systems, medical devices, media servers, and portable music players are all using hard drives and other mass storage technologies in ways that no one thought possible only a few years ago.
In many cases, these systems must survive years of constant use, even when handling massive numbers of file reads and writes. Users never expect to lose data or to endure long data-recovery times.
The problem is, many embedded systems operate in hostile environments, such as the automobile, where power can fluctuate or fail unexpectedly. Such events can easily corrupt data stored on hard drives and other storage media, resulting in loss of critical information. Consequently, the file system software that manages the data on the storage device must do more than provide fast read and write performance; it must also prevent data corruption caused by power failures. Moreover, it must eliminate the time-consuming file-system integrity checks typically required after a power failure since, in most cases, embedded systems must be fully operational immediately after rebooting.
Unfortunately, traditional block-based file systems for hard drives or solid state disks (SSDs) were never designed to ensure file integrity in the event of power failures and other sudden shutdowns. Some file systems (e.g., ZFS) for high-availability corporate servers provide such protection, but they consume too many system resources to be used in embedded devices. The QNX power-safe file system address these problems by implementing advanced server-level techniques in an efficient, embeddable solution, thereby providing the data integrity essential to today's storage-hungry embedded applications.
Reliable hardware isn't enough
In many cases, embedded systems use solid-state NAND or NOR flash. Many must rely on hard drives, however, because of the large storage capacities and low price per bit that hard drives offer. Unfortunately, few embedded systems can adopt the approaches to maintaining hard-drive data used in the corporate IT world, such as replicating data in multiple locations, making frequent backups, or using an Uninterruptible Power Supply (UPS). It's critical, then, that embedded file systems implement special techniques to prevent data loss or corruption.
Reliable data storage isn't simply a matter of using reliable storage hardware. It also depends heavily on file system integrity. And the biggest challenge to maintaining file system integrity is preventing data corruption caused by power failures. These failures can result from lightning strikes, faulty power supplies, battery problems, incorrect wiring, or a number of other causes.
Although many existing disk file systems are reliable, they can still lose data when a power failure occurs. For instance, consider these two common scenarios:
- A power failure occurs while the file system is writing to a block — When a hard drive loses power, it automatically moves the heads to a safe "landing zone" or removes the heads to prevent them from crashing into the disk surface. If the file system driver is writing a data block when this head removal occurs, the write operation will be incomplete. The error-correction code (ECC) for the sector being written will become inconsistent, resulting in a loss of all data in that sector.
- A power failure occurs when the file system is writing to multiple blocks — When writing a file, a file system typically writes to multiple blocks on the disk. If a power failure occurs after the file system has written to some, but not all, of the blocks, data loss can occur. To minimize the risk, file systems can synchronously write updates to directories, inodes, extent blocks, the bitmap, and other critical data in a carefully chosen order. However, these techniques still cannot eliminate data loss.
Traditional recovery solutions
When files or directories become corrupted, the traditional solution is to use a file system integrity check-and-repair utility, such chkdsk for Windows and fsck for Unix and Linux. However, such utilities have two serious limitations: 1) they check only the file system structure and the metadata, not the file data; and 2) they are time consuming and can be used only when the file system isn't in service, typically just after boot time.
If the root, bitmap, or inode file becomes corrupted, it may be possible to repair the file system. However, users must do this manually, using a time-consuming process that requires extensive knowledge of the file system structure. And if the root block or root directory gets corrupted, the user can no longer mount the file system, and the entire content may be lost.
Such file-recovery scenarios are impractical for embedded systems, which often have to run for months or years without human intervention. Therefore, it's critical that an embedded file system prevents such corruption from occurring in the first place. Once corruption occurs, it's often too late.