At the nanometer scale, engineers must not only design functionally correct circuits that can be tested and manufactured but also take extra steps to guarantee short and long-term reliability in the field.
It seems that over the last few years, significant and growing emphasis has been placed on reliability. We see articles and papers on it everywhere. The July/August 2013 IEEE Micro Magazine has "Reliability-Aware Microarchitecure" on the front cover. The Global Semiconductor Alliance has a working group on the electrostatic discharge (ESD) structures required for 3D-ICs. And, of course, the IEEE ESD Association and International Reliability Physics Symposium (IRPS) groups are constantly weighing the benefits and design tradeoffs that must be made for such endeavors.
Decreasing design cycles, tighter integrations, and faster turnaround-times have resulted in a general increase in the use and re-use of IP, from both internal and external sources, that has helped us put systems together at an accelerated rate, with ever-increasing transistor counts.
For me, one big question stands out above all: How do you know that you've assembled your system-on-chip correctly? What is "correctly," anyway? Each piece of IP has its own quirks, requirements, and history. Version incompatibility may exist for some IP combinations -- from something as simple as the power domains and voltages that need to be hooked up, to the complexities of a chip's power state table.
Correct may come in several different flavors. Our logic simulators help us a great deal for many of these issues, but not when it comes to the physical implementation. We build transistors in silicon, and ultimately, that's what we need to validate.
Traditional design rule check (DRC), layout versus schematic (LVS), and electrical rule check (ERC) tools have taken us a long way toward avoiding the more obvious manufacturing limitations we have for each process node. But, how do we adequately verify that thin-oxide gates have the correct bias? That high-voltage devices are not driving low-voltage devices past their rated thresholds? Or that the complex symmetry and orientation rules we have are dutifully obeyed?
This next level of verification, the level of reliability, focuses more on the subtle and longer-term effects that these circuits may experience. Detailed SPICE simulations can help immensely. But it is not possible to execute and resolve SPICE simulations across an entire complex digital IC. So SPICE can only help if we know there is a problem, where it is, and what the correct input vectors are for correctly stimulating that portion of the circuit.
Today's designs often contain elements that can be readily checked from a topology perspective, before we even get to SPICE simulations. One example that comes to mind is that of low-power design. In attempts to minimize both dynamic and static (leakage) power dissipation, lower voltages, thinner oxides, and, where appropriate, transistor stacking are all employed. Transistor stacking, where a single transistor with high leakage is substituted for two stacked transistors, each half the width of the original, results in a slight increase in signal delay, but a significant improvement in static leakage.
Isolating and confirming the presence of these design elements forms an integral part of an overall comprehensive reliability verification solution. So, too, does the validation of the bulk and contact locations of these transistors.
Much of the additional verification we do on our designs is to ensure robust operation over an extended operating period. What levels do you go to for the extra validation of design robustness? Are point-to-point resistance, current density, and electromigration simulations on your "must do" list? Or does someone else in another group manage that for you? As the designer, or verification engineer, what is driving the next thing that you're looking to add to the reliability verification suite? Is this motivation being driven by increased design complexity? Aspirations of greater design reliability? Or maybe even greater awareness of product failures and device returns?
Given almost unlimited CPUs and verification cycles, where do you see things going? What's your next killer check going to be? How do you contribute to improving your designs' overall reliability?
— Matthew Hogan is a product marketing manager at Mentor Graphics. He is an active member of the ESD Association, involved with the EDA working group, the EOS/ESD Symposium technical program committee, and the International Electrostatic Discharge Workshop management committee.