ANAHEIM, Calif. At the Design Automation Conference mid-June in Anaheim, California, an educational panel addressed the thermal issue in integrated circuit (IC) design. Two key questions were raised: When will this issue be emerging as a crucial concern if at all? What are the solutions to solve this potential crisis?
At a panel entitled "Keeping Hot Chips Cool: Are IC Thermal Problems Hot Air?", Devadas Varma, founder and chairman of Calypto (Santa Clara, Calif.) and panel moderator, highlighted the growing importance of thermal issues. He, however, wondered whether the broader design community needs to worry about it at 32-nm and beyond or if it only impacts a small segment of designs.
In his talk, Darvin Edwards of Texas Instruments, Inc. (Dallas, Texas) admitted that thermal issues are becoming a major design concern. He explained that high performance microprocessors are pushing the limits of air cooling with growing die sizes, growing parasitic leakage powers and higher powers in general, although many design, process and software tricks are being used to slow the growing power dissipation trend.
Edwards called for more thermal engineers to perform better system-level analysis, better thermal modeling tools and a higher level of accuracy in IC package abstraction to guarantee robust thermal design. An ultimate goal should be to reduce power consumption per function through smart process design, die design and software optimization, he added.
According to Paul Franzon, a professor of Eeectrical and computer engineering at North Carolina State University in Raleigh, the niche in which detailed thermal design matters grows with technology scaling. A key decision is ensuring that thermal design is being conducted with sufficient fidelity before a product fails due to a thermally induced failure. One approach is to develop a simple metric that predicts thermal design needs based on simple design and technology parameters.
"The leakage power goes up by 80 percent with every 10°C increase, and it gets worse with scaling," explained Franzon. "The impact of simplified thermal analysis leads to an under-prediction of the clock skew. You also under-predict the power consumption because you under-predict the peaks. You are thus going to over-predict the delay, and this will lead to unnecessary power increase."
At 32-nm, noted AMD fellow Stephen Kosonocky, major performance gains will probably be possible by adding cores, threads and special purpose accelerators. Improving single thread performance will also "put pressure" on chip designers to push die temperatures even higher, he continued.
At present, Kosonocky noted that thermal modeling is mainly used for floorplanning and package/system planning. As die temperatures are pushed higher, reliability and performance modeling may have to account for local thermal conditions.
Kosonocky declared: "IC thermal problems are real, and hot spots are getting hotter. Thermal gradients across the die are getting larger, and reliability can be affected by increasing hot spots. Monitoring systems and thermal budgets will keep this under control."
His final words were that more trade-offs are made during the high-level design phase. "Detailed thermal-aware floorplanning is a must, and thermal-aware modeling is necessary for trade-off analysis," he stated.
The next speaker, Alain J. Weger, an advisory engineer in the Optical Communications and High Speed Test Department at the IBM Thomas J. Watson Research Center, stated that silicon chips, in their operating environment, can reside in one or more of three regimes, namely power-limited, temperature-limited and hotspot-limited. He thus called for an understanding of these regimes and their interaction with design, layout and management.
"What happens if the temperature goes too high," asked Weger. "Circuit performance degrades, and leakage current can lead to runaway, reliability issues."
He added: "Chip timing must close depending on the criteria of the design. That means your power efficiency was once subordinated to timing. Power and power density are collective phenomena while timing is mainly a local phenomenon. Can timing and power constraints be reconciled? Those are the main challenges for the EDA industry."
The last speaker, Andrew Yang, CEO and chairman of Apache Design Solutions, Inc. (Mountain View, Calif.), said thermal integrity has become more critical with the emergence of system-in-package (SiP) designs, especially for stacked-chip with through-silicon via technologies. Addressing this challenge requires more accurate chip power estimation and distribution with consideration of process-dependent leakage current and activity-based dynamic power, Yang continued.
From an EDA perspective, Yang highlighted that a unified chip and package solution integrated with the system is needed. He added: "For SoC designers, if you are working on CPUs, GPUs or MPUs, you will need to consider putting thermal integrity. If working on SiP, thermal integrity is a must-have."