Computational-bound offloading, System-on-Chip integration, and custom processes are driving software applications into FPGAs.
Video and imaging circuitry only changes one way. Higher resolution, higher frame rates, and lower power requirements (particularly for UAV applications) equates to higher capacity and complexity. This is driving some software developers to sample the technique of offloading processor functions to run in parallel in field programmable gate arrays (FPGA).
While not difficult, there can be resource, timing, and other issues that may frustrate a first-time user. New platforms and compilation techniques have made it easier (but not easy) to offload and reconfigure software code to run in hardware. This article is written by experienced professionals in the tools, IP, and platform side of FPGAs -- it is aimed at software developers interested in sampling software-to-hardware compilation. It works to explain the architectural choices that need to be made and the HLL (high level language) tools approach to software-to-hardware compilation.
Image processing is only going in one direction -- higher resolution, higher frame rate and lower power.
The disruption of microprocessor-only architectures appears to be increasing. The growth of density, speed, and portability of video/imaging systems is taxing the ability of conventional processors to keep up. GPU, FPGA, and ASSP alternatives are knocking off certain design types -- for good reason. At the "bleeding edge" of frame rates and resolution, the amount of math processed is taxing system architectures. Parallel processing is a theoretical solution if you look at the math. FPGAs are safe, non-exotic parallel processors, mostly within the budget and skill levels of most teams. Software developers are used to HLLs, but often cannot practically use 100% of FPGA features. Using HLL-to-FPGA cross-compilation can often populate gates quicker than hand coding. And HLLs may do things "smarter", although a brilliant hand coder will always beat an average HLL coder in terms of Quality of Results (QoR), but there are thousands of the latter and only a handful of the former.
Different types of system architecture are emerging, including SoC (System-on-Chip), CPU co-processing, and "line-speed" processing.
Real-time video processing
In this architecture, code attempts to run in real-time; i.e., it goes out as fast as it comes in. A typical application is image analysis, such as 4K TV where no "judgment" is applied against the image. The architecture is more stream oriented. Pure video-in to video-out. Performance is achieved by designing optimal filtering processors in FPGA fabric and then just creating enough of them to achieve the necessary throughput. Typically, clock speeds (and therefore heat and power) remain relatively low.
Real-time video analysis is a variant of the above, but typically is an offload architecture. Again, the application is taking video-in and may, or may not, be passing video-out as close to real-time as possible. However, in the middle of the flow is an application that makes some sense of the image.
Large data sets -- such as differential filters for capturing changes in starfields -- end up being massive, but use simple, repetitive math. These types of application are typically decent candidates for acceleration via parallelization with minimal "traffic cop" activity relegated to the on-board processor in the form of a microcontroller.
In the case of a UAV (unmanned aerial vehicle) or machine (vision) inspection, the application may perform object recognition. By means of a transform, convolution, or filter there is some type of manipulation. Pipelining is almost invariably used, so there are some latencies. It's also safe to assume that there is a lot of data to handle and no time to store it for later use.
To Page 2 >