Design Article

Power Distribution Planning

Resve Saleh, Michael Benoit, and Pete

4/17/2002 12:00 AM EDT


IC power distribution systems are designed to provide needed voltages and currents to the transistors that perform the logic functions of a chip. The supply voltages are assumed to be constant across a chip, and are expected to operate reliably over the chip's lifetime. However, with the advent of ultra-deep submicron (UDSM) technology, the VDD and VSS grids actually fluctuate in value during chip operation due to increased resistance of metal lines, high current levels, and package pin inductances. Furthermore, the use of narrow line widths reduces long-term reliability of a chip. As a result, power systems have become so complex that they can no longer be designed using intuition and "back-of-the-envelope" calculations.

The conditions contributing to the complexity of power distribution systems have a significant impact on chip performance. Voltage (IR) drops on VDD lines due to resistance affect noise margins, which impact overall timing and functionality. Ground bounce, a similar effect, occurs on VSS lines. These effects are made worse by the presence of Ldi/dt voltage variations at package pins due to the increased rate of change of current used in deep submicron. High currents also induce electromigration effects in which metal lines begin to wear out during a chip's lifetime.

If IC designers do not design power systems with these conditions in mind, they have a difficult time producing a reliable design on the first pass. Worse yet, a chip may fail in the field after it is embedded in a system and in a customer's hands.

A complete picture of power grid integrity can only be obtained when effects such as IR drop, ground bounce, Ldi/dt, and electromigration are considered together. Because of the enormous complexity of power distribution systems, voltage fluctuations and reliability issues are difficult to predict without some form of detailed design analysis. These are full-chip issues that must be addressed by interconnect verification tools that have the capacity and performance required to analyze detailed representations of the chip in a reasonable amount of time.

If a power system design does not work properly the first time, it can lead to multiple respins of silicon, which costs time, money and possibly a lost business opportunity. By analyzing and verifying the power system at the full-chip level, designers can tape out with a high degree of confidence and greatly shorten the overall time required to get a chip to market.

In this article, the key issues of power system design—IR drop, ground bounce, Ldi/dt and electromigration—are described, along with the related analysis issues. Methodologies used to identify IR drop violations and potential electromigration violations in the power distribution system are also described, along with approaches that reduce the severity of these problems. Designing with these issues in mind and performing full-chip interconnect verification enables designers to address what would otherwise be an intractable problem.

Power Grid IR Drop

 
In practice, each line should be fractured into polygons and discrete RC values extracted for each polygon. This produces a tremendous number of resistors and capacitors for an ultra-deep submicron design. For example, in a 0.35 micron design with five layers of metal, the VDD grid of a five million transistor circuit may contain 30 million resistors and 20 million capacitors. A similar number of resistors and capacitors may exist in the VSS grid.
 

As device geometries decrease in UDSM technology, interconnect line width decreases, increasing the resistance along the line. Designers typically compute resistance along a conductor by counting the number of squares along the line and multiplying by the sheet resistivity, which is usually provided in ohms/square. As line width shrinks, the number of squares increases, causing an increase in the total resistance along the line. Similarly, designers compute the approximate capacitance along the line by multiplying the area of the line times the capacitance per unit area. Of course, both the capacitance and resistance are distributed along the metal line so the RC values derived this way are inherently accurate.

The effect of increased resistance on a power distribution system is that supply voltage is no longer an ideal reference. Instead, the supply voltage varies during normal circuit operation. The current flowing through the resistance in the power grid causes IR drops that depend on the placement of blocks, their interaction, current levels, and resistance levels. In the past, low resistance in a power system and relatively low current levels made IR drop a second-order effect that could safely be ignored. But in ultra-deep submicron, with lower supply voltages yielding smaller noise margins, IR drop is a first-order effect and can no longer be ignored during the design process.

Effect of Pin Inductance
Another source of voltage drop in the power supply is due to package pin inductance—typically around 10nH to 20nH. Ldi/dt creates a voltage drop across an inductor, and while L has not changed significantly over the years, the value of di/dt has continued to increase. In the meantime, the supply voltage has been decreasing from 5V to 3.3V to 2.5V and recently as low as 2.0V. These effects have combined to a point where the Ldi/dt drop can contribute significantly to an overall voltage drop in the power grid, especially in peak demand situations. The overall voltage drop is due to both Ldi/dt and IR, which are both dynamic phenomena and therefore cannot be analyzed quantitatively in a static context.

Ground Bounce
Validating that the ground voltage does not rise above a 10% noise budget is as important as ensuring that VDD does not drop below a 10% budget. Measuring ground bounce requires that the substrate be modeled as a distributed RC network in parallel with the metal routing for the ground grid. This significantly increases the complexity the network, especially when pin inductance or a more complicated pin model is included. A limited form of ground bounce could be obtained by modeling substrate contacts as individual ideal capacitances, but these values are difficult to obtain. An even more conservative approach to ground bounce analysis ignores the substrate entirely; however, using this approach, you would see that the behavior observed during analysis would be worse than the actual ground bounce on the chip.

Electromigration

 
If a chip fails in the field long before its expected lifetime, the consequences may be severe. For example, Intel's Pentium® was recalled in 1994 due to PLA programming error. The total cost was estimated to be half a billion dollars, not including the marketing and PR required to rebuild the image of the company. Clearly, the cost of a recall may affect more than the bottom line.
 

Electromigration (EM) is another important issue in the design of deep submicron power distribution systems. High current densities and narrow line widths cause EM. In the past, low current densities and wide metal lines, combined with special processing, helped to avoid the effects of EM. But now, speeds of 100MHz and higher and geometries of 0.35m and smaller have increased the potential for EM problems.

Failures due to EM can be catastrophic because they occur in the field when the chip is in a system and in a customer's hands. Depending on the location and number of failures, the chip may begin to operate incorrectly or shut down completely, which can lead to catastrophic consequences for a chip design company.

Issues in the Design of Power Distribution Systems
Designing a power distribution system requires the consideration of both EM and IR drop in a full-chip context. For example, consider the two blocks in Figure 1. If power distribution for Block A is examined in isolation, the additional loading due to the presence of Block B is not taken into account. If power is routed through Block A to Block B, a larger IR drop will occur in Block B since power is also being consumed by Block A before it reaches Block B. As more and more blocks are added, the complex interactions between the blocks determine the actual voltage drops.

The placement of these blocks is typically based on the timing requirements of a system rather than on IR drop, or else placement is based on the size and shape of blocks at the floorplanning stage. Therefore, sizing the buses properly to minimize IR drop while satisfying the required timing and area constraints is a design challenge that can only be met using full-chip analysis.

Figure 1:  Routing through a block

Since the total IR drop is based on the resistance seen from the pin to the block, one could route around the block and feed power to each block separately, as shown in Figure 2. Ideally, the main trunks should be large enough to handle all the current flowing through separate branches. In this case, the T-junctions have a high current density and may be prone to EM problems. It is important in this type of grid to examine the current density at all junctions, especially the corner providing large amounts of current to each block, to ensure that EM problems do not exist. The same argument holds for every block routed in this manner. Again, power grid design is truly full-chip when voltage drop and EM issues are considered.

Figure 2:  Routing around the blocks

Although routing power this way is easier to control and maintain, it also requires more area to implement. The large metal trunks of power have to be sized to handle all the current for each block. This requirement forces designers to set aside area for power busing that takes away from the available routing area.

Another approach to minimizing IR drop, depicted in Figure 3, is to have a solid grid of Metal 4 and Metal 5 and use a via array to connect the two layers, effectively tying the whole grid to VDD. While this solves the problem at higher levels, it simply shifts the problem down to the lower levels of metal. What about Metal 3 and Metal 2? Are they wide enough to handle the current levels they will sustain in terms of IR drop and EM?

Depending on the methodology, lower levels are often left floating until final assembly. Low resistance, high current paths can often be created by random placement of lower blocks. In fact, when you design the logic circuitry in the block, it is not clear where Metal 3 will tap to Metal 4, so you cannot predict the current flow. And if you cannot predict it, you must analyze it. The example in Figure 3 illustrates that you cannot avoid such a problem by solving it locally; you just shift it elsewhere in the design. Visibility into the global consequences of local changes is required to truly analyze overall power integrity.

Figure 3:  Vias in a mesh array methodology

 
Voltage drop on a power grid primarily affects timing. IR drop compromises the drive capability of the gates and increases the overall delay. Typically, a 5% drop in supply voltage can affect delay by 15% or more. Delay in a clock buffer has been known to increase by more than 100% due to IR drop. Such an increase in delay is critical when you are managing clock skews in the range of 100 picoseconds. Imagine the effect of this type of unexpected delay along centrally located critical paths. Then path delay is no longer predictable and, in fact, the critical path may be somewhere else in the design due to IR drop. This means that the performance or functionality of the design is unpredictable. Ideally, timing calculations should take worst-case IR drop into account to improve accuracy.
 

Part of the grid may have to be removed to route some signals, as shown in Figure 3. Which straps can be removed without introducing problems? If you arbitrarily pick one that is conducting a large amount of current, the excess current must flow in adjacent straps which may push the current density in them beyond acceptable levels. Clearly, such decisions cannot be made without determining the current levels in the straps and then picking ones that have lower current levels. The complexity of the problem requires a set of power grid analysis tools. These examples illustrate that design decisions must be made with a global perspective in mind.

IR drop is a dynamic phenomenon due primarily to simultaneous switching events in a chip such as clocks, bus drivers, and memory decoder drivers. As large drivers begin to switch, the simultaneous demand for current from the power grid stresses the grid. In a static context, voltage drops are highest near the center of a design and lowest near VDD connections to the power supply. However, during dynamic operation, these simultaneous switching events can cause severe voltage drops anywhere on the chip, and these are the ones that must be identified. These events, usually well known, can be triggered with typically fewer than 100 vectors.

The effect of IR drop on chip performance is significant. IR drop compromises the voltage noise margins of logic gates, due not only to voltage drops in the power grid during the rising edge of a signal, but also to the increase in voltage in the ground grid because of the same phenomenon during the falling edge. Once the noise margins drop below the budgeted amount, typically 10%, the design is not guaranteed to operate properly.

Over the years, supply voltage has been shrinking as device dimensions are scaled to avoid transistor punch-through conditions, hot-electron effects, and device breakdown. This has resulted in smaller and smaller noise margins. With IR drop, the margins are reduced even further which makes it even more difficult to manage a multi-million-transistor design.

In Figure 4, a portion of a design is shown with two metal lines connected by a narrow strap of metal. The metal lines must be wide enough to carry the average current needed to feed the circuitry connected to it. If the lines are too narrow, EM or IR drop may occur.

Figure 4:  Electromigration in the power grid

Since large currents flow in the periphery of a design, EM problems are usually observed in the outer regions of a chip. However, vias scattered all over the design may also be prone to EM problems. Furthermore, the lower levels of metal connected to devices are usually narrower and may cause EM problems depending on the current levels. Therefore, it is important to look for EM across the entire chip rather than just specific regions.

Finding all areas susceptible to EM prohibits any use of data reduction. You must include all the detailed extracted resistance data—otherwise, you may lose useful information. For example, a via cluster that has been reduced to one via resistor may mask a potential EM failure, and an EM analysis tool would miss the problem.

In Figure 5, current flows from Metal 5 to Metal 4 through a via array. Crowding occurs as the current "hugs the curve" going from one level to the other. Some of the vias in the center of the layout have been tagged as ones that may suffer from EM. If the 16 vias in the array were collapsed into one via, this region would not be flagged as having a problem. In reality, the nine indicated vias in the cluster may fail due to the high current density in the narrow dimension of the cluster. Any extraction and analysis for EM must have unreduced data to provide useful feedback.

Figure 5:  Electromigration in via arrays

Electromigration in the power grid is a DC phenomenon due to the average current flow in metal lines and vias. Design guidelines for EM are based on average current levels which, in turn, depend on signal line capacitance. Therefore, obtaining an accurate EM prediction requires the use of accurate capacitance information. Furthermore, since metal lines vary in height and material properties at different levels in the design, each metal layer has different failure criteria. To identify all potential areas of EM problems across a chip, the only solution is to perform full-chip analysis.

Black's law is used to predict the mean-time-to-failure (MTTF) of a metal line using the average current density, J, seen by the line. The more accurate the average information, the better the estimate of the MTTF. To obtain this information, you need to use a large number of vectors to exercise the design. The average current in every metal line must be measured and then divided by the width and thickness of the line. This is clearly impossible to do on a fabricated chip, and prohibitive to do using circuit simulation.

An alternative to expensive transistor level simulation is to obtain average currents from activity information, in the form of toggle data, using a gate-level or higher-level tool. Toggle data is simply the number of times a gate switches high or low during a simulation of thousands of clock cycles. If the toggle data is divided by the number of clock cycles, the activity information is obtained. For example, the core of a memory circuit may have an activity of 0.02% while a data path may be closer to 5%. These factors can be converted into average current information for the transistors connected to the power grid.

You must also determine the average flow of current in the entire power grid to assess reliability risks of a given design. It is not sufficient to determine the average behavior of a block taken in isolation, because the block may only be exercised periodically in a full-chip context. Furthermore, changes to the power grid in one section tend to have a global impact. Data reduction cannot be used either since some of the real EM problems may be masked by the reduction itself. Therefore, an accurate picture of EM risk cannot be obtained unless the entire chip is verified as a single entity. Any tool used for this purpose must have the capacity to analyze multi-million resistor grids.

Improving Full-chip Power Integrity and Reliability
The problems described above must be identified and fixed before going to silicon since they are very expensive to debug after fabrication. Verification tools exist for this purpose. Clearly iterations through a verification loop are preferable to more expensive iterations through a fab-find-and-fix loop.

When you look at methods of performing full-chip verification, it is clear that an engineering solution must be developed. A single run simulation of multi-million transistor circuits, with power grids and ground grids each containing more than 30 million resistors and a similar number of capacitors, is prohibitively expensive. Any simulation approach that attempts to solve the transistors and grids together will suffer from severe capacity limits. As mentioned above, block-based methods by themselves do not suffice since power distribution planning is a full-chip issue. Given the scope, trying to find the IR drops and EM risks is certainly a daunting problem. But if verification is the goal (rather than simulation), excellent approaches exist to address the problem.

Reducing Voltage Drop
As described earlier, voltage drops in the power grid come from two sources: IR and Ldi/dt. Reducing the impact of IR drop in a power distribution system can be accomplished in several ways. The simplest approach is to widen the lines that experience the largest voltage drops since increasing the width decreases the resistance (and the IR drop). However, this may not always be possible due to constraints in the routing area. Since IR drop is due primarily to simultaneous switching events, another approach is to stagger the gates that are switching together such that they switch at slightly different times—at least enough to keep the problem within the noise budget. Alternatively, you could reduce the buffer size, but this may not be possible if the design fails to meet performance requirements with smaller devices. Device switching can be staggered to reduce the peak demands of current by introducing delays on the signals driving the gates.

One effective approach is to use decoupling capacitors between power and ground, which can deliver the additional current needed by the power distribution system. These decoupling caps are usually scattered throughout the design in any available space, using transistors with their gates tied to VDD and their source-drains tied to VSS. All empty regions of the chip are filled with decoupling caps using the philosophy that you can never have enough. Ldi/dt effects can be mitigated by placing large capacitances near the pins.

A more aggressive solution is to use a ball-grid array, sometimes called solder bumps or C4™ bumps, where the power supply connections can be at various points within the chip. This expensive solution requires placing many C4 bumps across the chip to minimize the worst-case IR drop in any location. This solution tends to push EM problems to lower levels of metal that are usually narrower. Also, this solution cannot be used in sensitive areas such as memories and dynamic logic because C4 bumps generate alpha particles that may cause logic value upsets in the sensitive nodes. Nevertheless, when used appropriately, C4 bumps can reduce IR drop. The key to design is proper placement of the C4 connections, which can only be done effectively with full-chip analysis.

Reducing Electromigration Problems
Electromigration failures can be reduced in several ways. The basic idea in all approaches is to reduce the average current density seen by any metal segment. The simplest approach is to widen the metal lines. However, increasing the width beyond a certain point leads to over-design, which costs area and can reduce yields. Another approach is to change the current flow in the power grid itself by adding jumpers and straps between different points in the grid. This would reroute current around the affected areas, but such changes would require another verification pass to confirm that the problem has not simply been moved to another area of the design.

In Figure 6, note that the standard cell block on the right would not shown any EM risk if analyzed by itself. However, in a full-chip context, current flowing to adjacent blocks overloads the power connections in the block, and the analysis tool identifies an EM risk. Recognizing these problems at the planning stage is helpful, but difficult to do. EM requires a detailed grid with unreduced data. Therefore, a complete picture of EM risk can only be obtained at the verification stage.

Figure 6:  The most difficult aspect of power grid design with respect to EM is that no one block can be isolated from another. This plot demonstrates EM risk at the full-chip level.

A key point made earlier is that IR drop and EM problems cannot be solved separately; they must both be considered during design. To illustrate this, consider how to solve an IR drop problem in the chip in Figure 7a. The figure shows a power flow diagram of the VDD grid in a multimedia chip. Different shading indicates various levels of voltage drops. The darkest areas are the lowest points (valleys) of the IR drop contours. A significant voltage drop occurs in the center region of the chip because only the top portion of the power grid feeds the large drivers in the top section. The upper and lower regions of the power system are not connected.

Figure 7a:  Power grid before changes

If we strap the upper and lower regions together in two places, the voltage drop problem is reduced significantly, as indicated in Figure 7b. The depth of the IR drop valleys has been reduced to acceptable levels, and the voltage drops have been spread over a wider area of the grid. The lower region is now supplying more current to the upper region and therefore a better power distribution has been obtained by adding the two straps.

Figure 7b:  Power grid after changes

However, when examined in the context of electromigration, the results show that fixing the IR drop problem has caused an EM problem in the lower portion of the design. A review of Figure 6 (before the straps were added) shows EM problems at the periphery of the chip due to the high current levels in those regions. The lower half of the chip shows no EM problems.

But in Figure 8 (after straps were added), new EM problems are evident in the lower half as indicated by the small horizontal white lines. It was clear that the lower portion would supply additional current to the upper half of the design once a bridge was built between the two; however, it was not clear exactly how current would flow and exactly where EM problems might occur.

Figure 8:  New EM problems in unexpected regions after changes

 
Does increasing a line width always improve electromigration risk? No. Thin wires can have better EM characteristics than wider wires due to the physics of electromigration. Be aware that more is not necessarily better. Proper EM analysis accounts for this width dependence.
 

Repairing all the areas with potential EM problems would be labor-intensive, time-consuming and, frankly, unnecessary. Since every chip has a lifetime associated with it, the MTTF factor can be used to compute a probability of failure due to EM in a given lifetime. The goal of any changes to the power grid would be to decrease the probability of failure to an acceptable level. This limits the actual number of repairs needed and makes the job manageable.

Summary
The design of power distribution systems for deep submicron ICs is complicated by full-chip issues such as IR drop, ground bounce, Ldi/dt, and electromigration. In the past, certain DRC and visual checks were performed on the grid to ensure compliance with the constraints imposed by these issues. Usually, over-designing was an acceptable solution. But as technology moves deeper into UDSM, this is not a viable approach. Too much performance is sacrificed or the area penalty of over-designing leads to decreased yields. However, pushing the edge of the envelope may lead to under-designing. Chips that have been under-designed often fail on the test bench or later in the field. Therefore, situations of over-design and under-design must both be identified when evaluating the integrity of a power distribution system.

In the end, the design tradeoffs that satisfy all the necessary constraints are too complex to handle without tools that provide visibility into specific problems and their locations on a chip. Without these tools, today's designer has a formidable task in designing a power grid that can handle the power demands over the chip's lifetime. Designers are often required to tape out a design, and are left hoping that nothing will go wrong when the chip comes back from the fab. Murphy's Law is apropos for this situation: if something can go wrong, it probably will.





Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form