Reconfigurable computing, in the abstract sense, refers to any information-processing system in which blocks of hardware can be reorganized or repurposed to adapt to changing data flows or algorithms. But the phrase has been overused to the point that it has almost become meaningless. In fact there is no commonly understood definition of reconfigurability.
Thus, the most important point to make about reconfigurability is when it happens: Reconfiguration can happen at design time, deployment time, between execution phases or during execution. Each of these time frames defines a distinct category of reconfigurable systems.
The earliest reconfigurable computing systems predate even digital computers. Before digital logic, scientific and engineering computations were done on programmable analog computers: big banks of op amps, comparators, multipliers and passive components interconnected via a plug board and patch cords. By connecting components together, the very clever user could implement a network whose node voltages obeyed a set of differential equations. Hence the analog computer was a differential equation solver, capable of deployment-time reconfigurability. Toward the end of its era, the analog computer was combined with relay banks, and later with digital computers, to form hybrids. These machines could reconfigure themselves between execution sequences, providing an early form of yet another category of configurability. Some hybrid computer programmers became experts at juggling configurations while holding data in sample-and-hold circuits to extend the range of these systems.
The first moves toward really fluid reconfigurability came with the advent of embeddable digital computers. With the characteristics of a system defined by software in RAM, nothing could be simpler. Changing the operation of the system at installation, in response to changing data or even on the fly, is a matter of loading a different application. Variants on this theme included tightly coupled networks of computers in which the network topology could adapt to changing data flows, and even computers that could change their instruction sets in response to changing application demands.
But the first explorations into what most people today mean by the term reconfigurable computing came after the development of large SRAM-based FPGAs. The devices provided a fabric of logic cells and interconnects that could be altered albeit with some difficulty to create just about any logic netlist that would fit into the chip. Researchers quickly seized upon the parts and began experimenting with deployment-time reconfiguration: creating a hardwired digital network designed for a specific algorithm.
Experiments with reconfigurability in FPGAs identified two promising advantages: reduction of size or power consumption of the hardware, and increases in performance. Often the two types of advantages came together, rather than separately.
The advantages, it turned out, came from only a few quite specific techniques. One of these was simple: reuse of hardware. If it is possible to organize a system in such a way that it has several distinct, nonoverlapping operating modes, then you can save hardware by configuring a programmable fabric to execute in one mode, stopping, then reconfiguring it to operate in another mode.
An example given by the marketing team at QuickSilver Technology Inc. (San Jose, Calif.) is a cellular phone handset. When the handset is first turned on, it enters a search mode in which it examines a fairly wide spectrum of frequencies looking for a basestation. Once it has identified a basestation, the handset enters a quite distinct mode in which it establishes its identity and presence within the cell. If the phone sends or receives a call, it enters yet a third mode. The important thing about these modes from the hardware designer's point of view is that in each case the functions being performed in the symbol-rate logic are quite different. In a conventional system-on-chip (SoC) design each mode would have a big chunk of logic, and two of the three chunks would be mostly quiescent at any one time. By reusing one programmable fabric, each of the three chunks can be implemented in the same fabric and paged in as the mode changes.
Such techniques generally reduce the amount of silicon necessary to implement a system. But they don't necessarily improve performance. In fact, if the time required to change configurations in the programmable logic is large, or if the functions required are not ones easily implemented in the particular logic fabric, the result can be an overall loss in speed and increase in power consumption. In many cases, the gains from overlapping hardware functions are insufficient to compensate for the huge die area and power penalties of FPGAs relative to fixed logic.
The other major category of gain from reconfigurability comes from simplification. Often, if logic can be optimized not just for a particular algorithm but also for a particular set of data, the gains in both size and performance can be startling.
One of the earliest examples of this principle came from the field of audio signal processing. Audio systems use finite impulse response (FIR) filters for such functions as equalization. FIR filters are usually implemented as convolutions, in which a small, unchanging set of coefficients is multiplied by a moving set of incoming data points and the products are summed a standard multiply-accumulate operation. That's the main reason DSP chips have multiply-accumulate hardware.
If you know the filter coefficients ahead of time, though, you don't have to have multipliers. You can implement the function of a variable multiplied by a constant with a single level of combinatorial logic much less complex and much faster than a full multiplier.
The problem comes when someone moves the tone control knob on the system. That changes all the filter coefficients. But if the FIR is implemented in an FPGA, that only means that you have to synthesize new combinatorial logic nets and reconfigure the logic. Hence a filter array implemented in reconfigurable logic can be both much faster and much lower in power than a similar function in fixed logic or in software on a DSP chip.
A similar savings can come from simple pruning. The obvious example would be in multiplying or inverting sparse matrices. If the operation is critical enough to require parallel hardware implementation in the first place, huge amounts of hardware can be saved by only implementing the necessary data paths, and in finding ways to reuse a minimal number of complex logic elements. Run-time reconfigurability gives the system the opportunity to eliminate paths that are unnecessary for a particular set of data, and to increase the number of execution cycles while reusing hardware and intermediate results when data and constraints permit.
Experience has shown that when it is possible to compile a logic configuration for particular data sets on the fly, the gains in power and performance can be huge. But implementing such a system requires considerable forethought, a very different approach to system design and a not inconsiderable amount of run-time control software.
Despite these very promising results, the vast majority of reconfigurable hardware designs that have reached market have used an entirely different aspect of the art: the ability to configure the hardware either just before shipping or upon initialization to accommodate different functionality.
Most often, this takes the form of the ubiquitous FPGA somewhere in a design that is there primarily to handle engineering change orders. Given the fast design cycles and increasingly complex and imprecise specs today, it is almost assumed in many applications that the initial design won't be quite right. Being able to change the design slightly at the last minute, or even after the last minute, when the product is already deployed in the field, can be valuable so valuable that in some markets an entire critical portion of the logic may be committed to FPGA despite obvious cost, power and speed disadvantages.
The most egregious example of this thinking appeared late in the dot-com bubble, when networking companies rushed new features to market in less time that it took to fabricate the necessary ASICs. The companies shipped systems implemented heavily in FPGAs, and thus, in theory but not in practice, reconfigurable, simply to reduce time-to-market.
Implementation in silicon
Approaches to implementing reconfigurable hardware have been manifold. But they fall into a few distinct categories: moving functions to software, implementing functions in FPGAs, creating logic fabric specifically for execution-time reconfigurability and blending reconfigurability into traditional SoC methodology.
Moving functions to software is perhaps the most obvious approach to making them reconfigurable. It can achieve substantial reductions in hardware, can accommodate a certain amount of run-time optimization and permits very flexible engineering change implementation or feature creep. It also tends to increase energy consumption for a given function and to be the lowest performance of the plausible implementations.
It also might seem trivial. Given the advantages, the prudent architect puts everything in software that isn't constrained by speed requirements to be in hardware. Recent changes in the way embedded CPU cores and DSP cores are developed have made reduction to software a much more powerful technique than it formerly was.
These changes rest on the realization actually some 40 years old that sometimes small changes to an instruction set architecture can make huge changes to application performance on a given processor. Such CPU IP vendors as ARC Cores and Tensilica have built both architectures and tool sets around this notion of extensibility. The full impact of tuning an instruction set and data path design to an application can be illustrated by recent EEMBC benchmark results. On the telecom suite, a benchmark made up of autocorrelation, convolution, FFT, Veterbi and other tasks, the ARC and Tensilica cores each were able to increase their performance by a factor of approximately 40 by using processor optimizations. Full information on the benchmarks and results is available at http://www.eembc.org.
Nor is the effect limited to general-purpose CPUs. Analog Devices Inc. (Norwood, Mass.) in developing DSP cores for the cellular basestation market noted that while DSP cores were universally used for the symbol-rate processing that demodulated incoming data, the chip-rate functions such as rake filtering were typically regarded as too demanding for a programmable DSP and implemented in fixed hardware. However, the addition of a small number of multiple-MAC instructions to the SHARQ instruction set gave such performance increase on these algorithms that the company was able to pull the chip-rate processing into the programmable DSP, extending the range of software-based functionality upstream by a major step, and vastly increasing the potential flexibility of the chip-rate processing.
FPGAs and beyond
The earliest experiments in reconfigurable hardware were all done, of necessity, in commercial FPGAs. SRAM-programmed FPGAs had the basic requirements lots of logic and interconnect, some memory and the ability to reconfigure the part after installation. But the devices were designed on the assumption that they would only be programmed on power-up, and reconfiguring them on the fly proved slow and painful. Usually the chips had to be completely reset and reprogrammed, there was no provision for latching data during the reprogramming process and no way to hot-switch between configurations.
Xilinx became interested enough in reconfigurability in the mid-1990s to develop a new FPGA family that specifically addressed these issues. The part had deep configuration memory, the ability to partially reconfigure the chip on the fly and numerous other features. But all these features consumed considerable silicon overhead already the clay feet of FPGAs and the company judged the commercial opportunity too small to make the parts generally available. On the positive side, many of the important features have remained in subsequent generations of Xilinx parts, including the Virtex-II family, as undocumented capabilities.
The unfulfilled promise of these experiments led some Xilinx engineers to feel the idea deserved a chance in the market. One such group left the company to form a relatively stealth-mode startup, QuickSilver Technology, with the intent of developing a commercial on-the-fly reconfigurable FPGA. But after running up against the inherent limitations of FPGAs, the company decided that a clean sheet of paper was a better idea than a big eraser. The company developed what it claims is an entirely new kind of programmable logic fabric that avoids the notorious overhead of SRAM-programmable interconnect. QuickSilver's technology is described on page 124 in the "Comment" titled, "A look into QuickSilver's ACM architecture".
An emrging compromise
Meanwhile, researchers in both universities and in industry are looking at a compromise approach to reconfigurable hardware. They point out that even if you partition a system into nonoverlapping modes and share hardware, many of the critical hardware modules will be common to almost all the modes. It makes little sense to pay the overhead to implement these functions in programmable logic. Conversely, the functions that could benefit most from reconfigurable techniques tend to be relatively small and isolated.
Hence these researchers conclude that the best approach may be a conventional SoC, using modern processors and DSP cores to shift as much functionality as possible into software, and then deploying small, localized blocks of reprogrammable logic fabric only at those portions in the design where reconfigurability will have the greatest advantages.
This approach makes the configurable logic fabric just another hard IP block in the tool kit of the SoC designer. Such blocks are intermediate between pure hardware and processor cores, in that they require supporting software and configuration data in order to work, but don't actually execute code at run-time.
This may be the most plausible view of the future. If so, it bodes well for vendors who have invested in the concept of an embeddable FPGA-like core. The suffix "-like" is important here, because researchers also report that the requirements for an embeddable logic fabric are quite different from those for a stand-alone FPGA. Possibly the most thought on this subject in the U.S. has gone on at Leopard Logic (Cupertino, Calif.), which has produced a line of cores specifically for embedding. But Actel Corp. (Sunnyvale, Calif.) and eASIC Corp. (San Jose, Calif.) have also been active in the area. Significantly, almost-unknown M2000 (a privately held concern in Paris, France) may have the most design wins for embedded configurable fabric, dating back to last year.
The company has been quietly providing embedded fabric for both time-to-market reduction in SoCs and for more adventuresome true reconfigurable computing applications within Europe. And recently it was announced that IBM Microelectronics has been working for some time with Xilinx to produce an embeddable fabric of as yet undisclosed specifications.
So that is the landscape. Given the difficulties of engineering an application to take advantage of either hardware reuse or hardware simplification, it is likely that the main applications of reconfigurable hardware will continue to be providing for engineering changes, reducing development time and permitting product customization at shipment time. Reconfigurability in software, however, will continue to expand its range through systems as more and more powerful processors, accelerated by more and more algorithm-specific instructions, extend the range of software solutions into previously hardware-only areas.