Real-world apps
Cuda has already been used in commercial applications related to oil and gas exploration, computational finance and other computational modeling projects, as well as for faster, more-powerful hybrid rendering within graphics.
Evolved Machines, for example, is a Stanford University startup that uses neural networks to simulate electrochemical nerve cells for such applications as olfactory sensory processing and visual object recognition. The startup is using the Cuda language running on Nvidia GPUs to provide a more than 100x speedup in computation.
Hanweck Associates (New York), a consulting firm specializing in investment and risk management for financial institutions, has used Nvidia Tesla products and says some of its clients are achieving a 100x speedup of Monte Carlo simulations compared with single-core processing. The improvement gives traders greater visibility into what is happening in the markets, said Kirk.
"In this area, time really is money," he said. "Designs move from research to commercial implementation very quickly. Hanweck has built an engine that does real-time calculations of the volatility of the entire U.S. equity options market and can provide an update in under a second."
Much astrophysics research, meanwhile, is achievable only through computational experiments. General-purpose CPUs, even with gigahertz clock frequencies, have trouble providing the required gigaflops performance, whereas an Nvidia GeForce8800 can deliver more than 300 Gflops.
"This is faster than the Grape-6AF custom supercomputer that was built in Japan to handle this type of application," said Kirk. "In this field, if you can calculate things 300 times faster, you are going to do all the 'big science,' get in all the publications and learn everything first. Within six months of the first paper being published based on [research that was conducted] using a GPU, there was an international conference so that everyone could learn how to use the technology, and now it has pretty well taken over this area."
Nvidia is working to make Cuda more open by providing code downloads and by producing a tool set built under the Open64 Open Research Compiler. It is in the process of making much of the Cuda language itself open source, said Kirk. "We are doing virtually everything we can to make it a standard."
At present, GPUs from rival company ATI cannot run C and hence cannot run Cuda, said Kirk. "Their hardware cannot share data between different threads, and I would encourage them to implement hardware that can do that. If they did, we could port Cuda to run on their processors."
Meanwhile, the definition of the Cuda language does not preclude its running on more general-purpose multicore processors, said Kirk. Several universities have have compiled Cuda source code to run on multicore CPUs.
At present, most multicore CPUs run code much more slowly than GPUs do, because the general-purpose processors lack sufficient arithmetic logic units. But in a result that Kirk said he was not expecting, the Cuda version of code running on CPUs has been found to scale better for multiple processors than the hand-coded-multicore compiled version of the same program.
"Cuda parallelization does a better job at linear scaling than people hand-coding for multicore," said Kirk. "I expect that you will see products that will allow Cuda to run on multicores as well as enable load balancing across the two types of processors, because what you really want to do is use all the processors in your system."
So why wouldn't you merge the GPU and CPU into one piece of silicon?
"A hybrid device would do both kinds of tasks poorly. A device with a mix of GPU and CPU cores would be a great product for the low end, where you do not care about performance. But most of our customers want more computing power, so I wouldn't want to take any of my silicon and use it for a CPU."