Breaking News
Engineering Investigations

It always worked before, so you broke it

Tim Fyock
5/18/2011 05:22 PM EDT

 26 comments   post a comment
NO RATINGS
View Comments: Oldest First | Newest First | Threaded View
Page 1 / 3   >   >>
Dave33
User Rank
Rookie
re: It always worked before, so you broke it
Dave33   5/18/2011 8:30:24 PM
NO RATINGS
I've written and debugged firmware for many years and I can really relate to this story. Solving problems like those described in the story take time and a very good engineer. Unfortunately there's no way to measure the difficulty of a problem so there's no way to measure the value of the solution. Managers usually just look at how long you took and draw their own conclusions.

jnissen
User Rank
Manager
re: It always worked before, so you broke it
jnissen   5/18/2011 10:20:04 PM
NO RATINGS
Been there done that! Don't care to repeat it!

old account Frank Eory
User Rank
Rookie
re: It always worked before, so you broke it
old account Frank Eory   5/18/2011 10:35:10 PM
NO RATINGS
Interesting that in a new system -- new hardware and new software -- what ultimately turned out to be a signal integrity problem was blamed on software. "Mechanical and hardware assemblers said it was obviously software." And management simply took their word for it, without any data to back up that claim?

Tombo0
User Rank
Rookie
re: It always worked before, so you broke it
Tombo0   5/19/2011 1:27:39 PM
NO RATINGS
I deal with this everyday

zeeglen
User Rank
Blogger
re: It always worked before, so you broke it
zeeglen   5/20/2011 6:41:20 PM
NO RATINGS
This reminds me of a time when a watchdog timer would sporadically time out when a processor took too long to execute a command/response sequence on a control bus. This was a new product in both HW and SW, and the late night sessions in the labs were not burdened by "it's a HW fault / no it's a SW fault". We just admitted that nobody knew yet where the fault was and we had to work together to find it. Finally, using an analog scope (digital scopes had not been invented yet) the SW guy and myself saw an event whiz by that the timeout occurred within the 1 second allocated time of the hardware. I took another look at the hardware, a long-chain ripple counter and realized that the guy who designed this had done the stage count based on a complete cycle at the final stage. He forgot that the timeout actually occurred on the rising edge HALFWAY through the cycle. Solution: knife and green wire. Lesson learned: Never assume HW or SW. Test, test, test....

jimfordbroadcom
User Rank
CEO
re: It always worked before, so you broke it
jimfordbroadcom   5/20/2011 7:02:34 PM
NO RATINGS
Well, I've worked with enough boneheaded software types who didn't know anything about hardware to say that you are the exception, Tim. Embedded engineers who know both are worth their weight in gold, even if boneheaded managers don't realize it. I totally agree with zeeglen; forget the finger-pointing and just figure out what's not working. Sometimes it's both H/W and S/W!

DutchUncle
User Rank
Rookie
re: It always worked before, so you broke it
DutchUncle   5/20/2011 8:10:24 PM
NO RATINGS
Sometimes this is all in the software domain. Reviewing DOS driver code while writing new interface code on an ISA plug-in board, I found some code that seemed to have a race condition and asked about it. Since this had been written by one of the original product group, now a technical leader, and had been working for years, my question was disdained and dismissed out of hand. A mere few months later, intractable customer problems were blamed on my interface code. I finally traced it back the the same DOS driver code. Turned out this was the first customer PC system we had seen at whatever speed it was (500 MHz?) and it was the first time that the PC was fast enough to catch the race condition. Now it was still my fault for not having pushed harder when I found the problem before.

Alan60
User Rank
Rookie
re: It always worked before, so you broke it
Alan60   5/20/2011 9:29:31 PM
NO RATINGS
Been there, done it, got the company golf shirt. Years ago we had a hydraulic problem, excess pressure when braking. The redesign fix was messy, even for hydraulics. The $1M machine was down and people were not happy. I suggested a SW workaround to control deceleration, which we did and were soon back up running. However, word got around that I had to fix the SW to get the machine running. Unfortunately now I have to think twice before helping others.

WKetel
User Rank
Rookie
re: It always worked before, so you broke it
WKetel   5/21/2011 12:40:48 AM
NO RATINGS
Nobody has an exclusive right to make the mistakes, that is for certain. The problems arise when either the product definition is fuzzy, or the designer makes a mistake, or the builders don't follow the drawings. The funniest one that I experienced was a fairly simple circuit that I had designed and then built. I carefully selected the resistors to all be in the middle of where they needed to be. But when the production model was built, it was miles out of specifications. The problem eventually was traced to the arrogant detailers who did the circuit drawing. They corrected the "Mistake" made by "That dumb engineer", and added the "K" that I had neglected to put on several low value resistors. So instead of 22 ohms and 470 ohms there was 22,000 ohms and 470,000 ohms. The shortcoming of my designs are that they don't work as intended unless they are built as designed. This was reported to my manager at my next performance review, when the problem with that product was brought up. The response was "OH".

cdhmanning
User Rank
Rookie
re: It always worked before, so you broke it
cdhmanning   5/23/2011 3:14:40 AM
NO RATINGS
There are two reasons for that: 1) It is relatively easy to see how hardware works and much harder to see how software works. That means people will always tend to believe the problem is where they can't see it (ie. in the software). 2) It is also a matter of hope. Software errors can be fixed on short deadlines, but not hardware errors. Thus management/sales people always hope it is a software problem. When a product is on stop ship they will readily believe anything that gives them hope that a solution is at hand. Even if it is not really a software problem, there is often a software solution. Whenever this happens it becomes a software problem. I once had to fix a lubrication problem in software. Without that, many idetms would have had to be recalled and scrapped costing the company a few hundred thousand dollars. On another occasion a mechanical stop was repositioned. This could have cost approx $100k and 4 months to fix mechanically meaning the company would have lost a few million in sales. I fiddled for a day and found a software solution. At the end of the day, all that matters is that a solution is found. The customer doesn't care what is broken or who's fault it is. Just work together to find a solution. Everyone is trying to earn money for the same company.

Page 1 / 3   >   >>
Flash Poll
Radio
LATEST ARCHIVED BROADCAST
EE Times editor Junko Yoshida grills two executives --Rick Walker, senior product marketing manager for IoT and home automation for CSR, and Jim Reich, CTO and co-founder at Palatehome.
Like Us on Facebook

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
EE Times on Twitter
EE Times Twitter Feed