The folks from Altera have just announced a development program focused on the Open Computing Language (OpenCL) standard for FPGAs and SoC FPGAs(see also OpenCL gets upgrade, Altera tips FPGA tool).
What does this mean? What is OpenCL? Why do we care? Actually, there are so many implications here that this takes a little effort to wrap one’s brain around, but I’ll try to explain it and we’ll see how well I do…
Let’s start off with the fact that we all want more processing power. As an engineer I want as much processing performance “horse power” as I can get. The same applies to the folks performing multimedia processing (HD, VoD, 3D Video…); medical imaging (MRI, CT, PET…); high-performance computing (HPC) such as climate, financial, and fluid dynamic modeling; radar systems processing, and … the list goes on…
One way to increase processing power is frequency scaling, which basically means increasing the frequency of the CPU clock, but power considerations and physics limitations caused this approach to grind to a halt at around 3GHz circa 2003.
Another way to increase processing power is to increase the number of processor cores, which is why we now see CPUs containing dual-cores, quad-cores, and sometimes more.
Now, some algorithms are highly applicable to multi-core processing. In fact, some algorithms can benefit from having access to hundreds of processor cores, but where are we going to find hundreds of processor cores lying around? Well, by some strange quirk of fate, the graphics processing units (GPUs) found on today’s high-end graphics cards do, in fact, contain hundreds of processor cores.
The thing is that, a few years ago, some bright person came up with the idea of accessing the processing cores in the GPU and using them to perform for non-graphical computing. At that time, circa 2006, these cores typically worked with fixed-point values and accessing them was non-trivial. Circa 2007/8 folks started providing APIs that provided easier access to the cores. Also, the GPUs themselves became much more sophisticated – today they contain hundreds of cores each of which can support single- or double-precision floating-point calculations.
All of which leads us to OpenCL. Rather than my re-inventing the wheel here, let’s simply look to see what the Wikipedia has to say:
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus APIs that are used to define and then control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism. It has been adopted by Intel, AMD, Nvidia, and ARM. OpenCL is an open standard defined by the Khronos Group.
OpenCL gives any application access to the graphics processing unit for non-graphical computing. Thus, OpenCL extends the power of the Graphics Processing Unit beyond graphics (general-purpose computing on graphics processing units). Academic researchers have investigated automatically compiling OpenCL programs into application-specific processors running on FPGAs, and commercial FPGA vendors are developing tools to translate OpenCL to run on their FPGA devices.
OpenCL is analogous to the open industry standards OpenGL and OpenAL, for 3D graphics and computer audio, respectively. OpenCL is managed by the non-profit technology consortium Khronos Group.
To put this in a nutshell, the OpenCL standard is a C-based open standard for parallel programming. Note in particular the part that says “…execute on heterogeneous platforms consisting of CPUs, GPUs, and other processors.” The point is that, in addition to CPUs and GPUs, OpenCL can be compiled for use in FPGAs.“So what,”
you may say, “why not just use CPUs and GPUs?”
Well, the thing is that FPGAs are actually really, REALLY efficient when it comes to running things in parallel using hardware algorithmic acceleration functions. In fact, using an FPGA you can get higher performance than a GPU while using only about 1/5 of the power, which is “nothing to sneeze at” as they say.
But I’m wandering off into the weeds again… Altera’s OpenCL program combines the parallel performance capability of FPGAs with the OpenCL standard to enable powerful system acceleration. This heterogeneous system (CPU plus FPGA using the OpenCL standard) also has a significant time-to-market advantage compared to traditional FPGA development using lower level hardware description languages (HDLs) such as Verilog or VHDL.
Through its OpenCL program, Altera has engaged with multiple customers and expanded its university program to support the OpenCL standard for FPGA development in academia, and is actively contributing to the evolution of the OpenCL standard based on customer feedback. Early results of customer evaluations show a 35X performance increase compared to multicore CPU solutions, and a 50 percent reduction in development time compared to HDL-developed FPGA solutions.
Developed by an industry consortium called The Khronos Group, the OpenCL standard is an open, royalty-free standard that supports cross-platform, parallel programming of heterogeneous systems. As a standard parallel language, the OpenCL standard allows programmers to use a familiar C-based language to develop code across platforms, from CPUs to GPUs, and – now – expanding to FPGAs.
By adopting a heterogeneous architecture with OpenCL, system architects can maximize performance of algorithmic-intensive portions of their design while also achieving fast time-to-market. Target applications range from high-performance computing, including climate and financial modeling, to advanced radar systems, medical imaging, and video encoding and processing—any system that requires fast computations that can be parallelized.
The OpenCL standard offers a natural separation between “host” code—pure software, written in standard C/C++, that can be executed on any type of microprocessor—and the “kernel” code, written in OpenCL C, that runs on the accelerator. By profiling their algorithms, system architects can choose which functions to accelerate as kernels in the FPGA device to improve system performance. Multiple kernels can operate in parallel to further speed up processing. The host communicates with the accelerator device via a set of library routines with a minimal set of extensions that allow programmers to specify parallelism and memory hierarchy for the most computationally intensive portions of the code.
for more information on Altera’s OpenCL program, including a whitepaper and online learning materials, and also to register for updates. For more information on the OpenCL standard, visit www.khronos.org/opencl
If you found this article to be of interest, visit Programmable Logic Designline
where – in addition to my blogs on all sorts of "stuff" (also check out my Max's Cool Beans
blog) – you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here
to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).