datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Memory Designline Blog

Comment


Danilo

11/7/2010 9:51 PM EST

I do not discard optical interference any more after three different ...

More...



Rick_Hille

11/5/2010 10:24 PM EDT

Charles, I haven't run into this so far but I often wondered if it was just a ...

More...

A memory problem not to make light of

Rick Hille

11/4/2010 10:48 AM EDT

While troubleshooting a finicky EPROM memory module, engineers learn to thoroughly assess ALL factors of a failure-inducing operating environment

Of the several weird technical problems I’ve had the pleasure of solving over my three decade career (so far), one in particular sticks in my mind. I started out in the Product Development group of an aggressive, young telecom equipment manufacturer, whose flagship product was rapidly gaining prominence in the PABX market. The system was a dishwasher-sized box with about a dozen or so large circuit boards slid in and interconnected via backplane. The main CPU executed out of EPROM memory (2716 generation), which was arrayed on large multi-chip modules that sat on the CPU board as well as on an expansion board next slot over. Before I was given the opportunity to actually design products, I spent a few years cutting my teeth on investigating both production and field problems of this system, and making design improvements where warranted.

Around the time that single rail (+5V) EPROM technology was supplanting 3-rail (+12V, 5V, -5V), I was given charge of qualifying a particular manufacturer’s single-rail devices. These were supposedly equivalent or better speed, lower power, cheaper, and to boot, drop-in pin compatible. They had the potential to save the company many thousands of dollars per month, and although there was no looming obsolescence cloud at the time, there was much eagerness to get these into the product stream quickly. We programmed up a bunch of devices that the vendor sampled to us and populated a few memory modules. Everything looked good in the development chassis in our lab; lower supply current, clean signal transitions, solid CPU read access. CPU booted up on the bench, no problem.

So we stuck our “perfectly working” sample memory module into a real production unit, set it up for environmental chamber testing and then… WHAT? It’s not starting up?!! LEDs are flashing Memory checksum error code? What’s going on? We haven’t even turned up the heat yet!! What followed was an intensive electrical investigation: supply noise, signal quality, voltage tolerance, heat/cold sensitivity, slot to slot EM induction, chip date codes, etc. Nothing we could do within reason to reproduce the problem in our development chassis, however, as soon as we put the modules into a production unit, BINGO, memory checksum error. A week had quickly slipped by with no headway, and I was in constant contact with the EPROM manufacturer, who’s App Eng was equally perplexed.

Finally, out of desperation, we hauled a production system off the line into the lab. Swap the new boards in; still failing! OK, let’s check some signal timing. Put the memory carrier card out on an extender board and hook up an analyzer. What!! It’s working now! Is it a timing problem? Earlier measurements told us no, in fact, timing got better as the new chips access faster than the three-rail units they replaced. Are they too fast now and the extender and analyzer load is fixing the timing? Removed the analyzer; still working OK. Take the card off the extender and back into the card slot, failures are back. Usually, the exact opposite behavior is seen since the extender degrades signals a bit, even at the very low bus speed of the system. Is there something about the combination of backplane and the particular boards in the system?

This is the worst kind of problem a system can inflict upon its creators. To rule out backplane differences, we move all the cards from the production chassis to the open chassis test frame/backplane in the lab, and powered it from the production system power supply. Ah Ha! Failures stopped. Must be the backplane, it’s the only thing different. Ran all kinds of tests and probed backplane signals on the test frame; everything looked good and worked without fault. What is it about that production backplane, I wondered as I put my notepad down on top of the test frame. HEY WAIT! Now the test frame is failing? What Happened?

Retrace my steps; Check my notepad and try… WHAT!! Its working fine again!. All I did was… put my notepad down on the frame like this and… Holy $#*^@! It fails when my notepad is on top of the frame! I can reproduce the failure 100% of the time with nothing more than a cardboard and paper notepad!

Needless to say, the investigation proceeded to a solution very swiftly from that moment.

The problem was clearly not caused by an EM field since the paper notepad had no metal other than the staples in the binding. I tried with metal objects but the failure would only occur if it was a large sheet of anything opaque. And that was the big clue. It was an OPTICAL effect! What is it that could react that way? There were no optical sensors in the system as this is a telephone switch.

Wait a minute… we didn’t put stickers on the windows of the EPROMs like production units have. Install stickers, and presto!, the module fails everywhere, 100% of the time. In fact, the CPU can’t even start booting. OK, now that we have reproduced the problem, what is the analysis? Turns out that when the EPROM die was in the dark, the chip select inputs exhibited a high leakage current, high enough to overwhelm the Vol of the unbuffered 4000 series CMOS logic gate outputs that drove them. 4000 series could barely drive an LSTTL Vil level at 5V Vdd, and the EPROM CE inputs were exhibiting a leakage that approached that of a standard TTL input.

When exposed to light, the leakage diminished and allowed the CE inputs to be driven to a health Vil level (<0.4V). The lab frame was all open and exposed to the strong lighting that prevailed, allowing the memory system to operate correctly. In the system frame, the metal enclosure blocked most of the ambient lighting and hence provoked the problem. Why the system could still sometimes boot far enough to detect and report memory errors was just through the chance that the startup code resided in a memory device that was close to the card edge. It saw the most light leakage through the space between cards and worked well enough to allow the CPU to run diagnostics. We basically had to restrict the use of this particular manufacturer’s memory chips from that design.

It wasn’t long after that the company launched an effort to develop the next generation CPU system. I had the honour of designing the CPU and memory system for it, using the 2764 generation of EPROM and a combination of LSTTL and HC CMOS. I never ran into this exact problem again but it certainly opened my mind up to unexpected possibilities when facing a perplexing problem.

Starting in his early teens, building guitar amplifiers and effects as well as learning to service Hi-Fi equipment, Rick Hille has honed 30+ years of technology industry experience in various roles in Telecom equipment design, Video desktop and surveillance systems, and network server appliances. He is a graduate of Ryerson Polytechnical Institute, and continues to serve the technology industry as a Hardware Designer.





Tom VanCourt

11/5/2010 1:19 PM EDT

I remember the UVPROM generation well. Like, when some of our PROMs stopped erasing. They had been working fine. We peeled the labels off the windows to reuse them and set them in the tanning booth for chips about five times longer than necessary. Still, enough old data remained to make the chips unusable.

It turned out the stickers on the windows left glue residue behind, and that stuff must have been SPF 100. It kept the PROM perfectly safe from dangerous UV - until we washed it off with acetone. (BTW, only one brand of stickers caused the problem.)

Sign in to Reply



01830

11/5/2010 4:21 PM EDT

I had a situation with the opposite problem. I was called over to a photoshoot for a new product because the equipment was locking up or going berserk. I get over there and they had the board exposed with an EPROM. Someone had stuck a paper label on it, and when they took a flash photo, it would 'glitch' the system. I took one of the foil 'write protect' stickers from a floppy disk and put over the window and the system worked fine.

Sign in to Reply



CharlesGlorioso

11/5/2010 6:35 PM EDT

The problem described hasn't gone away with UV EEPROMS. A few years back we were working with a very small product with lot's of itty-bitty (technical term) surface mount parts including chip scale LDO linear regulators. In debugging the first boards, the engineer needed to use a microscope to locate and probe contact points on the PCBA. But each time he did that, the LDO would fail. Damn that ESD!!!. After breaking way too many boards with his probing, he noticed that if he held his hand a certain way the LDO repaired itself. It turned out that the chip scale package was not opaque, and the bright light source under the microscope was causing the silicon in the LDO to misbehave. So, the gremlin was not ESD. The LDO was light sensitive. I am willing to bet with ever smaller packaging this problem will get worse. After all silicon is inherently photovoltaic.

Sign in to Reply



Rick_Hille

11/5/2010 10:24 PM EDT

Charles, I haven't run into this so far but I often wondered if it was just a matter of time before supposedly non-optical parts become optical. I've seen more and more designs with (painfully) tiny packages and I've occasionally quipped to my colleagues my observation that some modern chip scale packages seems to be little more than shellac'ed die! Maybe it'll get to the point when the wave of a "magical" hand makes a problem disappear!

Sign in to Reply



Tom N

11/5/2010 7:53 PM EDT

When testing wafers we often need to eliminate light sources, some die designs are sensitive, some are not. And it's especially noticeable at our probing stations with the microscope light source, as one fellow above found out the hard way.

Sign in to Reply



Robotics Developer

11/5/2010 9:16 PM EDT

I remember using PALs and GALs in an array processor design mid 80s and had similar wierd experiences with "pin compatible devices". It seems that the particular devices from one vendor worked all the time and the 2nd source type devices would not. The parts were "identical" feature-wise, voltages (as spec'd) and operationally the same. The problem was with the programmer that we were using, it just did not do a good job with the 2nd version of the parts (and because we had stopped verifying the programing, we did not see the problem - it was intermittent and only for certain functions/inputs). The moral of the story is don't scrimp on programming/parts cost without doing a full investigation. When we added the programming verification step we weeded out the few badly programmed parts.

Sign in to Reply



Danilo

11/7/2010 9:51 PM EST

I do not discard optical interference any more after three different experiences. One FM receiver that changed tuned station every time I flipped the lights on, due to a glass packaged varicap diode. A PABX operator console using an old EPROM based Z8 microcontroller that refused to work in a clear room and a drag race Christmas tree (for the ones who know what is it) that started to get crazy at mid day because bad quality optocouplers. The optocouplers got intereference from the sunlight even enclosed in a metal box with some small vent holes.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)