Ultimate Screw-ups
Comment
ReneCardenas
ylshih
I had a comm ASIC project which was designed using asynchronous logic ...
One great big diode, conducting lots of current
Kenneth Boyce
9/17/2010 11:20 AM EDT
Several years ago I managed a group in a fab-less company which did audio product IC design. At the time we were designing an AC-97 controller and an AC-97 codec. The controller design and chip floor plan layout was completed and went into DRC (design rule check). Everything passed all the checks required for the design tool being used, and for the CMOS design process being used at the selected fab.
A few weeks later, the part came back from the fab late one afternoon. We stayed late, wanting to know if we had a live and kicking part. We applied power, and watched in horror as the part rapidly overheated and went up in smoke. Thinking something was wrong with the setup, and assuring ourselves nothing was, power was applied to another part. Poof ! The same result. Basically we had one great big diode conducting lots of current..and getting hot very fast.
So everyone trudged home quite mystified as to the problem.
Soon thereafter we began a microscopic view of the die from a de-capped package. What we found was massive amounts of signal lines tied together at different areas of the die. No way was the part designed that way, and any tests we had done prior to sending out to the fab would have caught such a major mess.
Without going into the days of effort that followed, we finally found out that all the connections that were tied together had identical first 16 characters of the 32 character node names used in the design.
Turns out we had upgraded our design tools to match the fab, but the mask maker contractor had not caught up at the time. They were still using 16 character node names. So, when 16 characters matched, the mask maker connected the nodes in the mask layers together.
Once the mask was corrected using 32 character node names....we had a good part.
Lesson: Double check anyone and anything in all the steps from design to fab. Don't assume the other guy has done his job.
KD Boyce has nearly 40 years experience in electronic systems design and semiconductor technical marketing. Further info may be found here.


iniewski
9/17/2010 12:48 PM EDT
Ken, I had a similar experience several years ago. After returning from Fab our part was drawing lots of current but it wasn't going up in smoke as yours so perhaps it was less spectacular. Turned out there was an error in IO library and as a result there were several shorts on PCI pads. I actually managed to blow the shorts manually to recover one good part that we could sample. It took several hours though. Metal mask change was obviously required at the end. One of my favorite de-bug stories. Perhaps someone should gather stories like this and publish...Kris
Sign in to Reply
kdboyce
9/17/2010 4:01 PM EDT
Kris,
I am sure there are lots of similar stories. If there is enough inputs from the readers, maybe we could collaborate on publishing them. Anyone want to contribute?
Ken
Sign in to Reply
Test_engineer
9/17/2010 6:14 PM EDT
As a test engineer I live by 3 "Golden Rules":
1. Assume nothing.
2. Check everything.
3. Trust nobody.
Sign in to Reply
W1PK
9/20/2010 9:02 AM EDT
I'd put it just a little bit differently. Some years ago I had a production lot of new boards scrapped because some features in the CAD data base weren't turned on when the Gerbers were plotted. Now, I'd been using a pretty thorough checking process for a long time, based on the top-down engineering philosophy borrowed from the software crowd, and had a record of 125 boards in a row without a re-spin before first shipment -- mostly without cuts or jumpers. What went wrong in this case was that I'd done my sign-off on paper plots from the CAD data base, so I knew the data base was correct. But because the films were plotted at the fab vendor, not in our plant, I couldn't see and check the final images. So to keep it from happening again, we changed our process to have the fab vendor send us duplicate films, which we compared to the signed-off paper plots, before releasing them to fab.
The lesson here is to close the checking loop around the entire process, and not get hung up on intermediate steps. I don't even look at netlists, I just want airtight proof that the final artwork matches the schematic and the dimensions of the component footprints. I work on a light table, tracing every line with colored pencils on both the schematic and the artwork plots. There ain't any other way to prove it's right the first time.
Nowadays, fab vendors work directly from a .zip archive of Gerber files generated directly from the CAD software, so what I check for sign-off is the Gerber files extracted from the release archive. That way, I know for sure that I'm seeing what the fab vendor sees, and nothing got corrupted at the last minute.
Sign in to Reply
Jimelectr
9/19/2010 2:38 AM EDT
Test_engineer, I agree, within reason. Nothing, everything, and nobody are some serious absolutes! If you truly lived by such absolutes, you would never get anything done! So I've modified your Golden Rules to:
1. Minimize assumptions.
2. Check all stuff most likely to be in error.
3. Trust very few, the ones who have proven themselves trustworthy.
Sign in to Reply
Salio
9/19/2010 6:13 PM EDT
I totally agree with not assuming that the other person will do or has done his job.
I was in a meeting this week with the director of the compnay I work in and we were talking about bid evaluations. He said that don't assume that the bidders will provide the information that they are suppose to. You as an engineer have to make sure that the bidders have provided the information that are suppose to.
Plus I would check and recheck to make sure that we are done done before sending it for fabrication.
Sign in to Reply
ylshih
10/1/2010 1:13 AM EDT
I had a comm ASIC project which was designed using asynchronous logic techniques. This was early enough that design tools were still immature and due to the asynchronous nature some home grown design and simulation tools had to be used. Eventually design, simulation, and rudimentary verification all passed and the design was released to fab. When it came back, it was a total dud - nothing worked. After over a week of re-verification and review, it was finally determined that a number of internal ground nodes were floating. The simulation tools hadn't caught it because the nodes started at 0V numerically and stayed there. I don't recall the reason the routing/layout tool didn't catch the signal list disconnect; but it was probably related to the home grown nature of the tools. Once the ground nodes were grounded, the design worked fine.
Sign in to Reply
ReneCardenas
10/27/2010 1:01 PM EDT
I agree with Jimelectr, absolutes prevent real accomplishment and are not applicable in large projects, with numerous participants. Every group must follow a given set of rules/guidelines and every boundary is checked and double checks to make sure the golden rules have been observed by all parties. In order to reduce assumptions, and focus in most likely faults
Sign in to Reply