Editor's Note: There are a lot of folks who are interested in accelerating their algorithms/programs written in C or C++. Many of these guys and gals are aware that FPGA-based accelerators are available, but they don't know how you actually make these little rascals perform their magic.
In order to address this, I contacted a number of the main players in this arena and asked each of them if they would be interested in penning an article that explained the process of:
- Writing a new application (or modifying a legacy application) in C or C++ in a form suitable for acceleration.
- Partitioning the application such that some portions will be compiled for use on the general-purpose processor and other portions will be implemented in the FPGA.
- Actually getting those portions of the application that are to be accelerated into the FPGA by one means or another).
- Interfacing the main application running on the general-purpose processor with those portions running on the FPGA.
- Analyzing/profiling (debugging?) the new version of the program, part of which is running on the FPGA.
Some of the companies I contacted declined because they were too busy (which is a good place for them to be), but three stepped up to the plate:
– Part 1: DRC Computer Corporation
– Part 2: SRC Computers
– Part 3: Impulse Accelerated Technologies (this article)
Field Programmable Gate Arrays (FPGAs) are increasingly being used as platforms for embedded and high-performance computing. FPGAs can be used to deploy complete, single-chip accelerated applications, or can be used as coprocessors in larger, multiple-CPU server applications, in areas as diverse as Image Processing, Bioinformatics, and Financial Computing.
In this article, we'll show how emerging tools for software-to-hardware compilation are speeding the development of high-performance, FPGA-accelerated applications. We'll describe some of the many ways in which software-to-hardware tools can be deployed, and present two examples of performance-critical algorithms that have been implemented in FPGAs using these new tools.
The FPGA as a coprocessor
FPGAs are best known as devices for hardware integration. Hardware designers have for many years used FPGAs for logic applications including state machines, memory controllers, "glue" logic and bus interfaces. More recently, however, embedded and high performance system designers have begun using FPGAs as actual computing elements. This has been made possible in part because of increased device densities, but also by advances in FPGA tool flows.
As dedicated coprocessors, FPGAs have significant performance advantages over traditional processors due to their massively parallel architectures. Hardware-level parallelism allows FPGA-based applications to operate at 100× or more the performance of an equivalent application running on an embedded processor, and 10× or more the performance of a higher-end workstation processor.
When measured as a function of computational power efficiency, the advantages of an FPGA-based computing strategy become even more apparent. Calculated as a function of millions of operations (MOPs) per watt, FPGAs have demonstrated greater than 1,000× power/performance advantages over today's most powerful processors. And that advantage – the processing efficiency gap – continues to grow. For this reason, FPGA accelerators are now being deployed for a wide variety of power-hungry computing applications.
The adoption of FPGAs for high-performance computing applications has been slowed, however, by a historic lack of FPGA software-to-hardware compilers. Embedded systems programmers, financial algorithm developers, domain scientists and other application programmers have been reluctant to use FPGAs due to a lack of familiar programming tools.
Recently, however, a new generation of software-to-hardware tools has emerged. These tools greatly simplify the task of moving software algorithms into FPGA hardware, putting these devices within the reach of software developers. Automated and semi-automated compiler and optimization tools now make it possible for software developers to quickly prototype, optimize and implement hardware accelerators using traditional C programming techniques. One of the leading tools in this area is Impulse C, from Impulse Accelerated Technologies. The Impulse C tools include a C-to-hardware compiler as well as a set of C-compatible API functions that can be used by software programmers to create hardware-accelerated applications. This article describes how Impulse C can be used for a wide variety of such applications.
Considering FPGAs for application acceleration
FPGAs have come a long way since their inception, as illustrated in Fig 1. From their humble beginnings as containers for glue and control logic, FPGAs have evolved into highly capable software coprocessors, and as platforms for complete, single-chip embedded systems.
1. FPGA devices have evolved to become highly capable computing platforms.
It has long been recognized that many of the computing challenges in embedded and high-performance computing can be addressed using parallel processing techniques. The use of dual- or quad-core processors, multiple computer "blades", or clustered PCs has become commonplace in many different application domains. FPGAs are now being deployed alongside traditional processors in these systems, creating what might be called a hybrid multiprocessing approach to computing.
When FPGAs are added to a multiprocessing environment, opportunities exist for improving both application-level and instruction-level parallelism. Using FPGAs, it is possible to create structures that can greatly accelerate individual operations, such as a simple multiply-accumulate or a more complex sequences of integer or floating point operations, or that implement higher-level control structures such as loops. Code within the inner-most loops of an algorithm can be further accelerated through the use of instruction scheduling, instruction pipelining and other techniques. At a somewhat higher level, these parallel structures can themselves be replicated to create further degrees of parallelism, up to the limits of the target device's capacity.
The programming of software algorithms into FPGA hardware has traditionally required specific knowledge of hardware design methods, including the use of hardware description languages such as VHDL or Verilog. While these methods may be productive for hardware designers, they are typically not suitable for embedded systems programmers, domain scientists and higher level software programmers.
Fortunately, software-to-hardware tools now exist that allow software programmers to describe their algorithms using more familiar methods and standard programming languages. For example, using a C-to-FPGA compiler tool, an application and its key algorithms can be described in standard C with the addition of relatively simple library functions to specify inter-process communications. The critical algorithms can then be compiled automatically into HDL representations which are subsequently synthesized into lower level hardware targeting one or more FPGA devices. While a certain level of FPGA knowledge and in-depth hardware understanding may still be required to optimize the application for the highest possible performance, the formulation of the algorithm, the initial testing and the prototype hardware generation can now be left to a software programmer.
Using standard C for application development has many advantages, not the least of which is the opportunity to use iterative, software-oriented methods of design optimization and debugging. With the Impulse C tools, for example, both hardware and software elements of the complete application can be described, partitioned and debugged using standard C programming tools such as GCC and GDB or environments such as Microsoft Visual Studio. During this process, the application programmer can make use of familiar C-code optimizations to increase performance without having FPGA-specific hardware knowledge.