MANHASSET, N.Y. The National Science Foundation (NSF) has awarded Virginia Tech researchers $350,000 to develop thermal reduction techniques, based on program phase analysis, for scientific applications and systems.
Computer systems can exhibit a "thermal emergency," or a sharp increase in temperature to a level significantly above a safe threshold, compromising reliability. Dimitrious Nikolopoulos, director of Virginia Tech's Parallel Emerging Architectures Research Laboratory (Pearl), and fellow researcher Kirk Cameron hope to improve the reliability of computer systems' processors by modifying programs and/or system software either to prevent thermal emergencies in the first place or to control an existing overheat condition without compromising system performance.
Toward that end, NSF funded the researchers' project titled "Thermal Conductors: Runtime software support for proactive heat management in advanced execution systems."
"What we want is to reduce the heat produced by large systems with lots of components in close proximity, such as those in a data center. By first studying the way applications produce heat, our hope is to identify places where we can reduce heat while maintaining the high performance required by users," Cameron said.
Cameron, director of the Scalable Performance Laboratory (Scape) at Virginia Tech, has worked to develop alternatives to the elaborate cooling solutions currently used to remove the heat generated by modern advanced processors, which can consume up to 100 watts. His research has already produced Tempest, or the Temperature Estimator, a portable freeware tool that lets the user directly measure temperature and graphically correlate the results to source code. The tool came from observing the effects of various power reduction strategies on processor and system thermal behavior.
"Thermal emergencies typically require a reboot, which is a frustrating process and can lead to loss of data," said Nikolopoulos.
The researchers combine profiling and control infrastructures to create novel software that enables automated, transparent optimization of system thermals, while striving to maintain the high performance expected in advanced execution systems and applications.
All of their software tools and techniques will be open-source and made available to the public via the Internet.