Design Con 2015
Breaking News
Comments
Newest First | Oldest First | Threaded View
<<   <   Page 2 / 2
Sanjib.A
User Rank
CEO
re: The case of the confusing clues
Sanjib.A   2/26/2012 6:25:36 AM
NO RATINGS
Debugging gets tougher when the issue gets reproduced frequently at the user's site, whereas the same issue doesn't get reproduced on the same failed module in lab. Once the issue gets reproduced in lab and we could probe signals by oscilloscopes or logic analyzers, things get easier a bit from that point. I had faced a couple of such situations and for both the cases the issues did not get reproduced unless the equipments were taken to the stressed environmental conditions: one case it was low temperature and other case it was artificially induced noise pulses on the inputs.

TFlynn
User Rank
Rookie
re: The case of the confusing clues
TFlynn   2/25/2012 7:50:35 PM
NO RATINGS
My biggest debug problem involved a system with about 60 PCBs, no system level schematic, multiple DSP processors and intermittent communication problems on a strangely terminated buses (kind of like RS485). Since the bus looked terrible, improving termination seemed like the proper attack. Nope, with "nice" looking bus signals the system didn't work. I "patched" the system with a 20pF cap on a digital signal line. Can't say I fixed it. Why a 20pF cap on a digital signal improves the operation requires more room than I can write here.

przemek0
User Rank
Rookie
re: The case of the confusing clues
przemek0   2/24/2012 8:34:15 PM
NO RATINGS
A similar thing affects MSP430 line of microcontrollers; there's a debate beteen those who argue that the default watchdog state can be ON, and those that point out that complicated init sequences can under random circumstances take longer than the default watchdog timeout, and so the default has to be off. FWIW, MSPGCC chose the latter option.

Haldor
User Rank
Rookie
re: The case of the confusing clues
Haldor   2/24/2012 6:49:42 PM
NO RATINGS
I've had two debugging problems that stumped me for a while, one caused by hardware and one by software. ----------------------------------------------- The hardware problem was with a USB peripheral that kept failing the startup inrush current limit. We were using every bit of the 100 mA we were allowed to draw at startup, and the inrush current test kept failing. This was an 8052 based design with XDATA memory latched I/O. I finally figured out that there was an extra 40 mA of current being drawn whenever the processor was in reset. The Address/Data lines were floating while the processor was in reset. The TTL latches quite naturally didn't like floating inputs and consumed their maximum current (20 mA each). This problem only happened at power up and cleared as soon as the processor came out of reset so it looked like an inrush current problem. I added pullup resistors to the Address/Data lines and the "Inrush" current problem was solved. We had designed litterally hundreds of PCBs with external memory and I/O over the years and had never noticed this behavior before. ------------------------------------------------ I had a software problem that intermitantly locked up the microprocessor at power up. One clue was that the program had a very large number of initialized global variables. When I commented out some of these initialized variables the lockup problem went away, put them back and the problem reappeared (intermitantly. This particular processor came out of reset with the watch dog timer reset enabled. The first line of code in my program turned off the watch dog timer, but the variable initialization code the compiler added took longer to run than the default watch dog timout period. We were right on the edge of timeout each time the processor was reset. I modified the compiler startup code to disable the watch dog timer reset before the variable initialization code was called. Problem solved.

didymus7
User Rank
Rookie
re: The case of the confusing clues
didymus7   2/24/2012 6:46:18 PM
NO RATINGS
The strangest clue I ever saw, not related to the problem was with a sine wave oscillator. It was part of a circuit that was being built into a hybrid. The oscillator circuit comprised two feedback loops, one for AGC and the crystal in another to provide the oscillator. I was called in because, while it would oscillate, the way form looked like anything but a sine wave. It twisted and turned and spiked and repeated it each period. Then the tech showed me something cute. He took his examination lamp and brought it near the open hybrid and suddenly the waveform popped into a perfect sine wave and stayed there as he removed the lamp. It had nothing to do with the problem (which was caused by placing amplitude limiting on the positive peak while AGCing on the negative peak), but it was mind-blowing.

zeeglen
User Rank
Blogger
re: The case of the confusing clues
zeeglen   2/24/2012 4:52:34 PM
NO RATINGS
You mean the hardware guys never tested their goodies in a hot environmental chamber or even used a hot-air-gun/hair dryer? Tsk Tsk! A poor-man's environmental chamber is so easy to home-brew with a table saw. Build a plywood box with a hinged door large enough for the UUT, line it with styrofoam insulation covered with grounded tinfoil (for ESD), place an electric space heater and a fan inside and drill a hole to insert a glass thermometer. If you want to get fancy use a temperature probe and a feedback loop. For a small UUT a picnic cooler works good. For cold remove the heater and blow the fan through a minnow-trap wire basket filled with dry ice nuggets. When thawing out you are guaranteed 100% humidity too. I remember a debug session where the watchdog timer occasionally timed out. It was a long 1 second ripple counter chain. Using an analog scope we managed to see the fault occur and it looked like the watchdog timed out prematurely. It had been designed by a junior, so I counted up the number of flipflops. The junior had designed it to provide a period of 1 second - he forgot that the timeout occurs on the rising edge HALFWAY through the period. It took 5 minutes with the knife and soldering iron to cut in another flipflop in the counter IC - problem fixed.

old account Frank Eory
User Rank
Rookie
re: The case of the confusing clues
old account Frank Eory   2/24/2012 3:23:08 PM
NO RATINGS
A very recent one for me initially looked like wayward software, then digital hardware, then analog hardware, and ultimately turned out to be a noise issue. I think noise problems are by far the worst to debug, especially when they create symptoms that strongly suggest a different root cause.

t.alex
User Rank
Rookie
re: The case of the confusing clues
t.alex   2/24/2012 12:48:36 PM
NO RATINGS
This relationship between hw and sw guys is still common nowadays:)

prabhakar_deosthali
User Rank
CEO
re: The case of the confusing clues
prabhakar_deosthali   2/24/2012 11:55:15 AM
NO RATINGS
This peculiar case of debugging happened way back,30 years ago when we were debugging the networking software for a 8080 based microcomputer system. At that time recompiling the code every now and then was not possible because of the time taken to assemble the whole code and create a new binary.So the normal practice was to have a patch list which we will be keeping updated as a ew bug was found and fixed. The patches had to be entered manually . For a few days we were observing that our software would run fine in the morning and start behaving erratically as the day progressed. As the test hardware had already been debugged and was not changed for a couple of months , the only suspect was software This was frustrating, as at the end of everyday we used to be at the same point where we started in the beginning. The hardware guys were in relaxed mood as they had no work till we crossed our debugging stage. On such an idle afternoon, one of the hardware guys was just curiously looking at the main board and trying to clear the dust over the ICs when he noticed some peculiar thing. In the memory bank there ICs with different access time ( some with 15ns and some with 22 nsec if I remember correctly). This speed difference worked fine in the morning when the ambient temperature was around 20 deg C but as the day progressed and the ambient temperature reached around 40 deg, the mismatch in access speed would become apparent and create havoc with the software creating a runaway condition. You can imagine the how furious we became on the hardware group when this bug was found.

<<   <   Page 2 / 2


Flash Poll
Top Comments of the Week
Like Us on Facebook
EE Times on Twitter
EE Times Twitter Feed

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
EE Life
Frankenstein's Fix, Teardowns, Sideshows, Design Contests, Reader Content & More
Max Maxfield

Want to Present a Paper at ESC Boston 2015?
Max Maxfield
8 comments
I tell you, I need more hours in each day. If I was having any more fun, there would have to be two of me to handle it all. For example, I just heard that I'm going to be both a speaker ...

Martin Rowe

No 2014 Punkin Chunkin, What Will You Do?
Martin Rowe
Post a comment
American Thanksgiving is next week, and while some people watch (American) football all day, the real competition on TV has become Punkin Chunkin. But there will be no Punkin Chunkin on TV ...

Rich Quinnell

Making the Grade in Industrial Design
Rich Quinnell
12 comments
As every developer knows, there are the paper specifications for a product design, and then there are the real requirements. The paper specs are dry, bland, and rigidly numeric, making ...

Martin Rowe

Book Review: Controlling Radiated Emissions by Design
Martin Rowe
1 Comment
Controlling Radiated Emissions by Design, Third Edition, by Michel Mardiguian. Contributions by Donald L. Sweeney and Roger Swanberg. List price: $89.99 (e-book), $119 (hardcover).