Visualization plays a key role in helping scientists understand large amounts of information. This role can range from the discovery of problems within the data that may not necessarily be evident in basic numerical studies to the development of new hypotheses and the presentation of results. While advances in computational power have led to the discovery and understanding of many phenomena, the available computing resources are often unable to process data sets of these sizes in an efficient, interactive manner. Unfortunately, this limitation, coupled with ever-increasing data, often leads to situations in which results are inadequately explored.
However, over the last several years-driven primarily by the entertainment industry-commodity graphics hardware has seen rapid enhancements in terms of both performance and programmability. The performance improvements have been significant enough that the graphics processor (GPU) now has roughly an order of magnitude more computing power and memory bandwidth than the CPU. This has led to our study of techniques that leverage the power of the GPU for improving the performance of visualization applications as well as for general-purpose computation.
As part of the Scout project toward a hardware-accelerated system for quantitatively driven visualization and analysis, we have devised a software environment and programming language that lets scientists write simple, expressive data-parallel programs to enable the computation of derived values and direct control of the mapping from data values to the pixels of a final rendered image.
This is all accomplished within an integrated development environment that provides on-the-fly compilation of code and the interactive exploration of the rendered results. Scout has achieved improved computational rates that are roughly 20 times faster than a 3-GHz Intel Xeon EM64T processor without the use of streaming SIMD extensions, and approximately four times faster than SIMD-enabled, fully optimized code. As an example of what can be accomplished in this environment, the rendered results were modeled on two ranges of computed entropy values from a core-collapse supernova simulation produced by the Terascale Supernova Initiative (www.phy.ornl.gov/tsi).
The first entropy range was partially clipped away to reveal the turbulent structure of the supernova's core, and the second (more transparent) entropy range isolated the details of the shock front.
Both ranges of entropy values were colored by the corresponding velocity magnitude values within the simulation. The entropy and velocity magnitude values, which are stored on a 256 x 256 x 256 computational grid, were computed in approximately 0.22 second using an Nvidia Quadro 3400 card.
Although those results are promising, several challenges remain for the successful use of the GPU as a general-purpose computational resource. There are four major disadvantages.
The first is that the task of moving data between the CPU and the GPU can be sufficiently time-consuming to overwhelm the advantages of the GPU's computational power. The introduction of PCI Express has the ability to greatly reduce the impact of this limitation. This will, however, require a commitment from the graphics hardware vendors to fully utilize the capabilities of the new interface.
The second disadvantage is that developing software for the GPU can be very complex in comparison to programming the CPU. This is primarily due to a restrictive programming model, a lack of virtualization of hardware resources and the need to map algorithms into the graphics-centric and data-parallel form required by both the hardware and the supporting graphics application programming interface.
The Scout language hides a large portion of these issues from the end user, but the hardware restrictions are still of considerable concern and an area of active research.
Another disadvantage of the GPU is a lack of floating-point precision. Although the latest hardware from Nvidia now supports a partial IEEE 32-bit floating-point format, hardware from ATI is limited to 24 bits of precision. It is likely to be years before double-precision floating-point values will be supported in graphics hardware. And it is also possible they will never be supported. This limitation can have a substantial impact on certain calculations.
The final disadvantage is the relatively small memory sizes available on the graphics card. The current memory sizes range from 128 Mbytes to 640 Mbytes-clearly not adequate to process large data sets in an interactive fashion.
Despite those disadvantages, we believe that the performance numbers, the rapid rate of innovations from the graphics hardware vendors and the recent announcement of support for multiple GPUs in a single desktop system (see www.nvidia.com/page/sli) show that the study of the GPU's impact on general-purpose computing is a viable area for continued research.
In addition, the GPU can provide scientists with a substantial resource for their desktop systems that can be leveraged to provide interactive data exploration and analysis. We are actively exploring the use of hundreds of GPUs in parallel, within a cluster-based environment, to address the memory limitations and explore the scalability of such systems.
Finally, working with GPUs can provide insight into the future of computer architectures. Both streaming architectures and the growing trend by the leading CPU vendors of using multicore and multithreaded processors suggest that more parallelism may be available on future commodity systems. In particular, it seems reasonable to predict that GPU-like cores will be found in the CPUs of the future, or that future GPUs will acquire more general-purpose functionality.
In your spare time . . .
The reader might be interested in the following references for further information. Ian Buck and Tim Purcell published "A toolkit for general-purpose computing on the GPU" in Randima Fernando's book GPU Gems, pages 621-636 (Addison-Wesley, April 2004). Patrick S. McCormick, Jeff Inman, James P. Ahrens, Charles Hansen and Greg Roth wrote "A hardware-accelerated system for quantitatively driven visualization and analysis" in IEEE Visualization 2004, pages 171-178 (October 2004).
Patrick McCormick (firstname.lastname@example.org) is a researcher in the Advanced Computing Lab, Computer and Computational Sciences Division, at Los Alamos National Laboratory (Los Alamos, N.M.).
Rendered results were modeled out of two ranges of computed entropy values
from a core-collapse supernova simulation.