# Accurate Thermal Analysis of Chip/Package Systems

Post a comment

**Modeling System Heat Generation and Dissipation**

Thermal analysis consists of two major elements: predicting the heat generation spatially and temporally in the system, and modeling the heat transport and dissipation in the system. The system analysis must include a good model of all physical elements of the chip and the chip's thermal environment, including the package, board, heat sink, and cooling system.

The law of the conservation of energy explains heat conduction and the analysis of it. The heat energy generated at any given point in a system equals the heat dissipated at this point plus the power causing a temperature increase at this point. When there is more heat generated than is dissipated at a point of the system, that part of the system heats up. The system reaches thermal equilibrium and a stable temperature distribution when the heat generated equals the heat dissipated at each point of the system.

The major heat source in a VLSI chip is the power generated by transistors in the silicon substrate. The total power of a VLSI circuit consists of dynamic power, short-circuit power, and static power. At process nodes 90nm and below, leakage currents are a major source of static power consumption. Current in metal interconnects on the chip is also generating heat through resistive loss, resulting in a self-heating of interconnects. In addition, the heat from neighboring SiP components also affects the final temperature distribution of a chip, which is especially important for 3D stacked dies thermal analysis.

The temperature variation across the chip is dependent on the power distribution, as well as thermal conductivity and geometry of different materials. The amount of heat removed and the amount of heat trapped in the system are dependent on the design of package, board, heat sink, and cooling system, as well as the temperature difference between the VLSI system and the ambient, as shown in *Figure 5*.

*5. Paths of heat dissipation through substrate/metal layers, package, heat sink to the ambient are sown for the chip and surrounding cooling environment.*

**Thermal Analysis of IC-Package-System**

One of the challenges for an accurate chip-level thermal analysis is the modeling of boundary conditions, including package, heat sink, board, and cooling system. The final power and temperature distribution strongly depends on the boundary conditions. For example, the boundary conditions of SiP systems are critical for modeling the heat transfer between different chips. On the other hand, chip power distribution is the heat source for system-level thermal analysis, and is temperature dependent because of leakage power. Therefore, an IC-Package-System thermal co-analysis, as shown in Figure 6, is required for determining the final power and temperature distribution.

One of the potential problems when chip power is considered to be a constant is thermal runaway. For example, the heat dissipation ability of a selected package is based on an estimated constant power. However, the actual power may be higher than expected according to the actual temperature. In turn, the increased power causes high temperature distribution. And this process continues until the system burns out.

*6. A co-analysis flow for IC/Package/System looks like this.*

There are typically three types of boundary condition models used when performing an on-chip thermal analysis. These types are defined below and different types of boundary conditions can be mixed in a system.

- Adiabatic: assumes no heat transfer across the boundary
- Convective: assumes a known model for heat transfer across the boundary, which requires a known equivalent thermal resistance network
- Isothermal: assumes no change of boundary temperature distribution over time

*Figure 7*shows a convective boundary condition modeled using equivalent thermal resistance across the faces of a die and

*Figure 8*represents an isothermal boundary condition model.

*7. A representation of a convective boundary condition model.*

*8. Isothermal boundary condition model plot is quite colorful.*

**Integrated Power-Thermal-Electrical Analysis**

The prediction of power dissipation is critical for a stable thermal analysis flow. For this, the power dissipation at each point of the system, in transistors as well as in interconnects, must be analyzed. The power generated at any point will either be transported away or heat up this point of the system, raising its temperature. However, many of the thermal, electrical, and mechanical properties of the system are also temperature-dependent and will change as the temperature in the system is changing.

For example, leakage power in the chip increases sharply with temperature, and metal resistivity increases with temperature. This means that the power dissipation at each point of the system changes the temperature in the system, but the temperature also changes the electrical parameters and power dissipation in the system. This interdependence mandates that power, thermal, and electrical characteristics cannot be analyzed independently in the system, but demand an integrated process to achieve an accurate analysis of the system.

Once a stable power-thermal-electrical solution is achieved, the temperature distribution across the design can be used to analyze its effect on power, timing, EM, and voltage-drop. Due to the mutual coupling between power, thermal, and electrical parameters, it is most efficient to separate the analysis into two different dependency loops, Power-Thermal (PT) and Power-Thermal-Electrical (PTE), as represented graphically in *Figure 9*.

*9. A representative Power-Thermal-Electrical analysis process looks like this.*

**Power-Thermal Loop**

The distribution of power dissipation in the system determines the temperature profile in the system. The resulting system temperature profile, on the other hand, has a direct impact on the system power dissipation. A stable solution of power and thermal distribution therefore requires an iterative analysis between the power and thermal analysis, that is, an iterative PT loop.

At the starting point of the PT loop, the power dissipation in the system is estimated assuming a homogeneous temperature distribution. Based on the resulting power distribution, the temperature distribution is analyzed, generating a three-dimensional thermal map. As described previously, the temperature distribution in the system has an impact on the power dissipation at each point of the system. Therefore, using the simulated thermal map, the power dissipation of the system has to be updated, and a new iteration started. Eventually, the iterative loop converges on a stable solution for the power and temperature distribution in the system, as shown in *Figure 10*. The initial value used for temperature will not affect the final solution. However, the initial temperature value will determine how quickly the PT loop converges. For systems with thermal runaway, the PT loop will not converge.

*10. Beginning at a constant temperature, iterations between power and thermal analysis provide a convergent solution for the temperature distribution across the chip.*

**Power-Thermal-Electrical Loop**

The temperature distribution in the system also affects the resistance of metal wires, which affects the power dissipation in the wire *(Figure 11)*, as well as the voltage drop across the system, and therefore the local Vdd value. The power dissipation of the circuits also depends on this local Vdd value and temperature. This creates the need for a second analysis loop between power calculation, thermal analysis, and circuit analysis, the Power-Thermal-Electrical (PTE) loop.

After finishing the PT loop, the temperature-dependent power and 3D temperature profile can be used to analyze the thermal impact across the chip on timing, reliability and voltage drop. Based on the temperature profile, the temperature-dependent resistance can be calculated and updated. This allows for an update of the self-heating of interconnects in the design. It also allows an update of the IR-drop / Dynamic Voltage Drop (DvD) analysis in the system, determining the new supply voltage at each cell. Using this new supply voltage profile, the cell currents in the system are updated as well, and PT loop is started again using the updated power distribution as starting point.

*11. You can build an interconnect model for self-heating of wires.*

Full-chip IR-drop/DvD analysis can be very time consuming due to the large complexity of the power supply network. It can also require several iterations of the PTE loop to achieve a stable power-temperature-voltage distribution in the system. Pre-characterized cell current allows a fast update of the power distribution for different supply voltage and temperature. Also, it is essential to have a fast and efficient voltage drop analysis engine tightly coupled to the power calculation and the thermal simulation.

*Figure 12*shows a sample three-dimensional thermal map for the temperature variation of a layer of the chip, achieved after the convergence of PTE loop. The figure clearly shows the inhomogeneous distribution of temperature, revealing hot-spots that require special attention.

The benefit of a thermal-aware sign-off flow is clearly apparent in areas such as reliability analysis. Where a constant temperature assumption can require over-design to fix many false EM violations and a too-low assumed temperature may cause reliability issues in the design, especially where local hot-spots of higher temperature require additional EM fixes to guarantee reliability.

*12. A mathematical package can show a 3-D map of chip layer temperature profile.*

*Figure 14* illustrates the benefit of a thermal-aware EM sign-off analysis, which highlights the 'real' current density hot-spots on the chip, rather than the constant temperature scenario shown in *Figure 13*. For example, by using a conventional analysis method, 11,000 EM violations were identified. Whereas using a thermal-aware EM analysis flow, this number was reduced to 2,000 actual EM violations.

*13. A design for conventional reliability at constant temperature. shows this EM profile during sign-off analysis.*

*14. By contrast a design for thermal-aware reliability has this EM profile of sign-off analysis.*

**Summary**

In conclusion, below summarizes a list of components that are required for effective thermal integrity analysis flow.

**Platform architecture**: For efficiency full-system thermal integrity analysis must be based on a single-platform that performs power analysis, thermal analysis, circuit analysis, power/signal net extraction, reduction, and temperature-dependent cell characterization.**Power Calculation**: Requires a temperature-dependent power calculation engine for full-chip analysis, which considers the state-dependent leakage, short circuit, and switching power.**Thermal Simulation**: Must simulate the temperature profile across the chip, layer by layer, taking into account the thermal properties of different materials, packages, board, heat sinks, and cooling systems.**Electrical analysis of the power/signal network**: Analysis must provide temperature-dependent R extraction, reduction, and circuit simulation of the on-chip power/signal network. The thermal-aware power grid extraction can then be used to determine the temperature impact on full-chip EM and IR-drop/DvD analysis.**Thermal-aware library characterization**: To achieve cell-level complexity and transistor-level accuracy, the complete cell library should be characterized using a temperature-dependent and voltage-dependent library characterization flow based on an accurate Spice simulation engine.**Impact on Timing**: For the timing impact analysis, temperature-dependent and voltage-dependent delay information of the devices is needed to generate a modified SDF. This SDF can be used in a static timer to determine the impact of the on-chip temperature profile on full-chip timing. The updated temperature at each instance can also be fed into a tool to determine the effect of temperature on critical path and clock timing.

**About the Authors:
Ting-Yuan Wang** is a principal engineer at Apache Design Solutions in Mountain View, California. He holds a PhD Degree in Electrical and Computer Engineering at the University of Wisconsin-Madison.

**Margaret Schmitt**is an Area Technical Manager at Apache Design Solutions in Mountain View, California. She holds a Masters Degree in Electrical Engineering from the Technical University of Berlin, Germany.