Breaking News
Engineering Investigations

Chasing transient errors in a flight-control computer

NO RATINGS
View Comments: Newest First | Oldest First | Threaded View
DutchUncle
User Rank
Rookie
re: Chasing transient errors in a flight-control computer
DutchUncle   3/14/2011 12:59:14 PM
NO RATINGS
I'm a software guy. Early in my career I worked on a transient-seeming bug with one hardware and one other software engineer, all three of us recent grads. After a very long evening culminating in finding the problems (yes, plural) and a celebratory dinner together, we came in the next morning to serious questions from our respective supervisors about the "fights" that had been going on in the lab. Seems our casual trash-talking shoot-around style ("We got to this instruction! It's YOUR fault" "Aha! and then we got to here, it's YOUR fault!" "Wait - what's that signal - it's YOUR fault!") had caused concern among the older buttoned-down pocket-protector generation walking by the door. Comments over coffee had been heard before our joint mutually-congratulatory email had been read, and enthusiastic teamwork was worth less than decorum. It was a depressing lesson: finger-pointing and blame-shifting was such an engrained behavior at that place that collegial joking around was assumed to be serious accusation. Yes, our products may be serious, even life-critical; but we can still achieve that work in good spirits!

parkgate
User Rank
Rookie
re: Chasing transient errors in a flight-control computer
parkgate   10/25/2010 8:38:33 AM
NO RATINGS
Iím a hardware engineer who is aware of this finger pointing problem so when a software engineer comes to me with what he things is a hardware problem I assume that is what it is and set about looking for the hardware problem. If I then have difficulty finding that problem I find the software engineer will come round to accept it might be software. I donít consider my time wasted as it helps build relationships with other team members and we often end up with extra test code that can be used to check hardware in the future. Also I was once called into a project where the hardware and software engineers were having stand up arguments about whose fault it was that a system didnít work. To resolve this problem I wrote some very simple test code to check out the hardware and came to the conclusion there was nothing wrong with it. My simple test code was in C and did the same job as the software engineerís carefully crafted (for speed) assembler. The arguments stopped after that. Terry

kolio
User Rank
Rookie
re: Chasing transient errors in a flight-control computer
kolio   10/7/2010 7:43:36 AM
NO RATINGS
Putting a battery monitor and/or brownout reset circuit at system desing phase would avoid a lot of power problems later. H/W design...

WKetel
User Rank
Rookie
re: Chasing transient errors in a flight-control computer
WKetel   10/2/2010 12:25:19 AM
NO RATINGS
For one set of projects, very early in my career, I worked with a software company that was quite an exception. Our arrangement was that when there was a problem that could be either hardware or code, we would each investigate, and which ever found the bug would immediately call the other so that they could stop searching. This worked very well.

ylshih
User Rank
Rookie
re: Chasing transient errors in a flight-control computer
ylshih   10/1/2010 5:01:45 AM
NO RATINGS
That you found this the way you did was certainly serendipity. Logging faults against time and interval might have eventually led you to the same outcome, but it would have taken much longer and still would have required additional insight to make the connection. Having been in many hardware versus software finger pointing incidents, each side has had to eat crow enough times that it's best to take a team approach and provide meaningful cooperation as you often can't be 100% sure which it is.

kfield
User Rank
Blogger
re: Chasing transient errors in a flight-control computer
kfield   9/30/2010 12:08:57 PM
NO RATINGS
Great story. Interesting observation about not putting too much stock into the hardware engineer's claim that the hardware was functioning correctly. What questions would you have asked or what would you have done differently to avoid this situation?

antiquus
User Rank
Rookie
re: Chasing transient errors in a flight-control computer
antiquus   9/29/2010 7:51:12 PM
NO RATINGS
My similar experience involved my desktop PC. One afternoon at about 3pm my PC rebooted for no obvious cause, and then again on each of the next two days, also at 3pm. We had a fairly small office, so I was aware of all recent changes that had taken place, and the only new piece of equipment was a rather large temperature chamber that had just been installed. On that first day of trouble I had personally programmed the chamber for a 7-day 24-hr test cycle. When I checked, sure enough 3pm was the time that the "cold" sequence began, and the dual compressors kicked on. Odd as it may seem, that oven was on a dedicated circuit, and mine was the only PC that was affected. The problem was solved by changing the PC motherboard's "power off" option from instantaneous to 3-sec delay.

More Blogs from Engineering Investigations
An analog engineer and a digital engineer join forces, use their respective skills, and pull a few bunnies out of a hat to troubleshoot a system with which they are completely unfamiliar.
You rarely get anything extra for free. Engineers who work on vision systems know this. But sometimes other people believe that just putting a camera on it constitutes an effective inspection setup.
It's really mind-boggling to think of what my hourly rate would be if I were paid for the actual time that I spent troubleshooting a problem to get to a fix. The challenge is to avoid going down a rathole.
Often water flow is used to explain basic electronics in the form of analogies. After learning the hard way, an engineer attempts to explain basic water flow in terms of electronics.
We in the EMC/EMI world often use the saying "Don't put the well next to the outhouse" to describe large EMI sources located next to very sensitive victims.
Top Comments of the Week
August Cartoon Caption Winner!
August Cartoon Caption Winner!
"All the King's horses and all the KIng's men gave up on Humpty, so they handed the problem off to Engineering."
5 comments
Like Us on Facebook

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
EE Times on Twitter
EE Times Twitter Feed
Radio
LATEST ARCHIVED BROADCAST
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for todayís commercial processor giants such as Intel, ARM and Imagination Technologies.
Flash Poll