Graphics processors (GPUs) have incredible promise as high-performance compute engines, but the programming model is a problem. RapidMind and PGI have come up with two different solutions. Here's what you need to know about them.
Graphics processors (GPUs) have incredible promise as high-performance compute engines, but there's a problem: the things are a pain to program. The primary challenge of GPU programming is the complex hardware—GPUs can contain hundreds of processing engines. This massive parallelism means that the programmer has to figure out how to split the workload into hundreds of bits. Not a small problem, to say the least.
But that's not the only challenge! GPUs also present a portability problem. Take NVIDIA's CUDA technology, for example. This technology lets you program NVIDA GPUs using a C-based language. That works fine until you want to use something other than an NVIDIA GPU. Switch to another vendor, and you'll have to re-do a lot of coding.
Startup RapidMind has a better solution. RapidMind's platform let you write code once and port it to a variety of hardware, including multicore CPUs, GPUs, and the Cell BE. This is a smart approach, and one worth looking at. (We have an article that shows how to using their approach for video transcoding. It's a good place to start.)
The only problem with RapidMind's approach is that it still leaves you locked into a proprietary technology. Yes, you are free to use many hardware platforms, but your code will only work with RapidMind tools. The Portland Group (PGI) has come up with a solution that I find more appealing. Its Accelerator compiler takes straight C or FORTAN code. All you have to do is add directives telling the compiler where to use the GPU for acceleration. The code remains fully portable, since compilers that don't support these directives will simply ignore them.
PGI's solution isn't magical—it leaves you plenty of challenges. For example, you still have to figure out where to apply parallelism, and you have to write your code in a way that is parallelization-friendly. These are big challenges, but they are challenges you will have no matter what tools you use. Overall, my take is: I don't know what kind of results PGI gets, but I wholeheartedly endorse its approach.
Interestingly, PGI's approach looks an awful lot like what's happening in DSP tools. For example, we have a great how-to series on C optimization that focuses on getting better performance while maintaining portability. This is clearly the best approach to take if you can.