FORT COLLINS, Colo. A C compiler has been developed for systems that use a microprocessor and a field-programmable-gate-array coprocessor. The Single-Assignment C compiler not only speeds up computationally intensive algorithms but does so without a software developer's needing to know the specifics of the FPGA architecture.
"The goal of our project is to make FPGAs available to applications programmers by raising the abstraction level from hardware circuits to software algorithms," said Wim Bhm, a computer science researcher at Colorado State University, where the new approach was developed. "Our compiler for a variant of the C programming language maps high-level programs directly onto FPGAs for image processing, pattern recognition and similar applications." Fellow Colorado State computer scientists Ross Beveridge and Bruce Draper also contributed to the project.
The researchers recently demonstrated the speed increases their smart compiler can impart to image-processing applications, including:
- Probing algorithms for silhouette recognition of objects from laser-radar data;
- Prescreening algorithms for surveillance satellites; and
- Data compression algorithms using wavelets.
"What these image-processing applications have in common is that a tremendous amount of data flows through them. But their algorithms consist of very simple operations that, when performed by an FPGA, can accelerate the application from twentyfold to eight hundredfold," said Bhm. The probing application was accelerated 800x, the prescreening algorithm 40x and the wavelet data compression by 20x, he said.
Bhm views the smart compiler as the next step in the natural evolution of software. From specialized assembly language in the early days of computing came a step to high-level languages like C, which allow variables to represent low-level resources and automatically compile commands into the machine language of whatever computer hardware is available.
High-level languages like C enabled application developers themselves to write their own low-level software. Likewise, Bhm's "smart" C compiler eliminates the need for a hardware expert to design a circuit inside the FPGA. Instead, the compiler itself arranges to execute the application developer's algorithm inside the FPGA. Thus, the FPGA's hardware can be reconfigured by application developers, merely by writing in Bhm's high-level language.
"Applications developers are usually not hardware experts, but they know a lot about the application domain so they prefer to write in a high-level language that compiles into a specification for their hardware," said Bhm.
Since entire programs cannot be run on an FPGA, the compiler scans the high-level C code to identify critical "inner-loop" functions that can be cast into the reconfigurable hardware. The compiler then partitions the program into an outer shell, which runs on a host microprocessor, and an inner shell that runs on the FPGAs, plus the interface code that glues those two together. Bhm calls this "one-step compilation," because the compiler automatically partitions the program to use an FPGA coprocessor.
"FPGAs work best on the first level of data manipulation pixel, matrix, image, signal, FFT, filters and the like. So what we currently have is a host processor and a board inside that PC that houses three FPGAs. The host processor does the file I/O and other outer-shell tasks of the program, while the FPGA acts as a coprocessor executing the inner shell," said Bhm.
As time goes by, Bhm expects software evolution to favor his compiler technology even more. "FPGAs are going to become more complex. Instead of just a sea of identical gates surrounded by programmable wires, Xilinx is going to sprinkle islands of 18-bit integer multipliers among its sea of gates." Such devices, he said, "will become harder to program using the circuit-diagram approach, making our compiler technology more beneficial, because the compiler doesn't care it just crunches that stuff out. We see it as a natural trend." With FPGAs getting larger and more complex, "the tools must become more abstract, just as in any programming paradigm."
FPGAs execute their algorithms more quickly than DSPs, because their hardware has been reconfigured to precisely match the computational requirements of the application. For critical inner-shell functions, such as prescreening massive streams of raw image data from a satellite, the FPGA can be reconfigured to perform all the necessary bit-banging in one parallel operation per datum. "The more operations you can perform in parallel, the faster the FPGA will execute the inner shell," said Bhm.
Identifying those critical inner-shell functions, and partitioning them into parallel operating circuits inside the FPGA, had been a tedious and time-consuming operation that had to be performed by a hardware specialist either manually or with a circuit-diagram language. Even using a very high-level design language did not remove the manual translation step between the C code written by the application programmer and the VHDL written by the FPGA programmer.
Transparent to users
"Our goal is to make it totally transparent users don't want to know whether they have a coprocessor in their computer," said Bhm. "They just want to see that their graphics draw faster. We take that burden away from the application developer, and instead compile our C-like language directly into the circuit-diagram language."
His group was the first to accomplish this task, Bhm said, because normal C-language operations are incompatible with fixed hardware configurations. For instance, variables in normal C are assumed to map directly onto memory locations that can be changed at will during the execution of the algorithm. But variables in an FPGA correspond to the wires of the circuit, so they can only be changed once when the FPGA is reconfigured for the application. Consequently, the normal C operator allowing reassignment to variables has been removed from the Single-Assignment C compiler (SA-C).
"SA-C is a single-assignment language, implying that the value of a variable can only be set once, because variables in SA-C correspond to wires in the FPGA, and electrical wires can be driven by only one source," said Bhm.
Normal C-language programs also assume a von Neumann-style "stack" is present that permits subroutines to call themselves by pushing their intermediate results onto the stack, and pulling them back off afterward. As a result, functions can recursively call each other, but since this is not possible in a fixed FPGA circuit, recursion has also been removed from SA-C.
Bhm's group also worked to extend the SA-C language for parallel bit-banging. For instance, a new data type was introduced with variable bit precision, rather than being limited to 8-, 16- and 32-bit integers. Data types can be any bit length in SA-C, and can even be of "unspecified" length. Likewise, true dynamic multidimensional arrays can be set up in an FPGA, instead of using fixed-sized pointers to index into fixed-sized arrays in memory locations.
"We have really been developing an application-driven technology," said Bhm. "For instance, from our image-processing applications we discovered we needed to extend C with these really nice regular data structures. Now we want to do more complicated applications so that we can expand our language further and make our compiler more sophisticated."
Of the three applications that Bhm's team tried out on its SA-C compiler, the fastest speed gain was achieved in a probing application that automatically recognizes objects say, enemy tanks and displays the results in less than a tenth of a second. The same algorithm running on an 800-MHz Pentium processor took 65 seconds, resulting in an 800x speedup.
The second fastest speedup was achieved in a prescreening application that scans incoming satellite imagery for areas of interest. When it discovers such an area, a flag is set that can be read by secondary pattern-recognition programs. In prescreening military vehicles and facilities, it was 40 times faster than the systems now in use. Moreover, if deployed in a satellite, its FPGA could be reconfigured from the ground for future upgrades.
For the future, Bhm wants to tackle a host of new applications, including digital TV decoding and data streaming, to develop new language extensions.
An audio recording of reporter R. Colin Johnson's full interview with Wim Bhm can be found online at AmpCast.com/RColinJohnson.