Engineering Investigations
Comment
zeeglen
There was an old engineering semi-joke "If it was difficult to design, it should ...
WKetel
I have had to fix and repair other peoples work on many occasions, and sometimes ...
The SED (Someone Else's Design) Problem
Dwight Bues
10/24/2011 10:31 AM EDT
A famous comedian coined the phrase, "I hate it when that happens!!!" I can sympathize. I believe that I have used that phrase every time I have had to decipher, debug, or improve on Someone Else's Design.
One day, my boss gave me a task to figure out what was wrong with a VMEBus-based processor interface box that we had. Since this was back in the "Dark Ages", this box had a Motorola 68010 microprocessor, coded in Assembly language (no C, Java, or HTML here). What we had done was to take two 6RU high, custom-made, wire-wrapped, 7400 logic-based, interface boxes and roll them into one 5RU high, VMEBus box. We maintained the interfaces to the two HP1000 Fast Fortran Processors that "crunched the numbers."
This box was slick: it had a touch panel on the front to perform "Health and Status" of the processor, and display "tickers" to log data from the interfaces. The problem that this box had was the "Blivet" problem (putting 10 pounds of stuff in a 5 pound bag). The packaging, cabling, rear panel connectors, power, and cooling were all good. The problem was that with all the joy of saving rack space and so forth, the designer got a little ahead of his capabilities with the Assembly code.
The original interfaces only implemented the "L" mode. The new VMEBus design implemented both "L" and "S" modes, effectively a 4-fold increase in complexity. In the "L" mode, it extracted the "DF" and "NV" bits from the 144 bit data frame every 125 microseconds. The "L" mode was successfully implemented.
The "S" mode was a new animal, however. This mode provided one "DF" and one "NV" bit every fourth 193 bit, 125 microsecond frame. Testing this mode proved that it did not work. The tickers would update in one burst and then the box would lock up. I rather suspected a problem with the logic implemented in the Assembly code. A few inquiries proved that the designer had left the company and was unavailable to answer any questions about what he has designed.
I began by poring over the Assembly code and was pleased that the designer has done a fabulous job of documenting what he had done. The most arcane issues to resolve with Assembly code usually involve "subroutines". If you see code with "JSR" and "RTS" sprinkled throughout, you will definitely have a hard time tracking the logic. As you will soon see, subroutine accesses (JSR, RTS) also take quite a few CPU cycles to execute and this is a key parameter to control when writing in Assembler. Interrupt Service Routines (ISRs) are even slightly MORE arcane, as they will be executed anytime an external interrupt occurs.
What I eventually discovered was that most of the logic for searching every fourth frame for one "DF" and one "NV" bit was executed INSIDE the ISR; two ISRsevery 512 microseconds. NOW I figured that I was close to the problem. I took the Motorola Assembler book and started adding up the CPU cycles required to execute the ISR and I found that one ISR would not be finished before the next interrupt would occur. This would tend to keep pushing registers onto the CPU stack until it would run out of memory and lock up.
The fix was not easy. I took me more than a month to re-implement the ISR so only the critical instructions were executed inside the ISR and to set up a way to store intermediate calculations, in order for the values to be available outside the ISR (commonly, when a CPU goes off to service an interrupt, it pushes its registers onto the stack, making those values unavailable to the logic inside the ISR).
The changes were completed, tested, and the box remained in service for many years afterward. I was able to be proud of this accomplishment.
Describe a memorable experience in which you solved a baffling technical problem, involving irate bosses or customers (or both). Share your best investigative work and we’ll pay you $100 if we publish it. Questions? Email Brian Fuller or Naomi Price.
One day, my boss gave me a task to figure out what was wrong with a VMEBus-based processor interface box that we had. Since this was back in the "Dark Ages", this box had a Motorola 68010 microprocessor, coded in Assembly language (no C, Java, or HTML here). What we had done was to take two 6RU high, custom-made, wire-wrapped, 7400 logic-based, interface boxes and roll them into one 5RU high, VMEBus box. We maintained the interfaces to the two HP1000 Fast Fortran Processors that "crunched the numbers."
This box was slick: it had a touch panel on the front to perform "Health and Status" of the processor, and display "tickers" to log data from the interfaces. The problem that this box had was the "Blivet" problem (putting 10 pounds of stuff in a 5 pound bag). The packaging, cabling, rear panel connectors, power, and cooling were all good. The problem was that with all the joy of saving rack space and so forth, the designer got a little ahead of his capabilities with the Assembly code.
The original interfaces only implemented the "L" mode. The new VMEBus design implemented both "L" and "S" modes, effectively a 4-fold increase in complexity. In the "L" mode, it extracted the "DF" and "NV" bits from the 144 bit data frame every 125 microseconds. The "L" mode was successfully implemented.
The "S" mode was a new animal, however. This mode provided one "DF" and one "NV" bit every fourth 193 bit, 125 microsecond frame. Testing this mode proved that it did not work. The tickers would update in one burst and then the box would lock up. I rather suspected a problem with the logic implemented in the Assembly code. A few inquiries proved that the designer had left the company and was unavailable to answer any questions about what he has designed.
I began by poring over the Assembly code and was pleased that the designer has done a fabulous job of documenting what he had done. The most arcane issues to resolve with Assembly code usually involve "subroutines". If you see code with "JSR" and "RTS" sprinkled throughout, you will definitely have a hard time tracking the logic. As you will soon see, subroutine accesses (JSR, RTS) also take quite a few CPU cycles to execute and this is a key parameter to control when writing in Assembler. Interrupt Service Routines (ISRs) are even slightly MORE arcane, as they will be executed anytime an external interrupt occurs.
What I eventually discovered was that most of the logic for searching every fourth frame for one "DF" and one "NV" bit was executed INSIDE the ISR; two ISRsevery 512 microseconds. NOW I figured that I was close to the problem. I took the Motorola Assembler book and started adding up the CPU cycles required to execute the ISR and I found that one ISR would not be finished before the next interrupt would occur. This would tend to keep pushing registers onto the CPU stack until it would run out of memory and lock up.
The fix was not easy. I took me more than a month to re-implement the ISR so only the critical instructions were executed inside the ISR and to set up a way to store intermediate calculations, in order for the values to be available outside the ISR (commonly, when a CPU goes off to service an interrupt, it pushes its registers onto the stack, making those values unavailable to the logic inside the ISR).
The changes were completed, tested, and the box remained in service for many years afterward. I was able to be proud of this accomplishment.
Describe a memorable experience in which you solved a baffling technical problem, involving irate bosses or customers (or both). Share your best investigative work and we’ll pay you $100 if we publish it. Questions? Email Brian Fuller or Naomi Price.
Navigate to related information


Patk0317
10/24/2011 10:57 AM EDT
This is a great story that explains what a real engineer needs to do - not just design, but troubleshooting to find a solution when something doesn't work as it should - in this case when a spec. was changed.
Sign in to Reply
Frank Eory
10/24/2011 6:34 PM EDT
Most of us have probably had at least a few experiences in dealing with SED (Someone Else's Design), and indeed it can be frustrating. As in Dwight's case, my own experiences usually involved a request to add features and/or increase performance of an existing design -- "re-use with changes" -- after the original designers were long gone from the company.
In most cases, re-use with changes didn't save any time compared to just doing a new design from scratch to meet the new requirements.
I would much prefer to have the problem of what to do with OPM (Other People's Money) :)
Sign in to Reply
joshxdr
10/25/2011 4:46 PM EDT
The original designer may have been over his head, but he added comments. Engineers need to be rewarded for code quality, as in proper comments, coding guidelines, naming conventions, etc. To many engineers write code that is impossible to understand or modify. Any code that is worth writing is worth modifying, and that means thinking of your successor while you are writing it.
Sign in to Reply
David Ashton
10/25/2011 5:05 PM EDT
Much simpler stuff...but when I was about 20 a friend's girlfriend's father owned a nightclub. He refurbished it, and got a fancy light unit that had a few hundred watts of coloured lights above a huge milky perspex ceiling. The electronics was supposed to bright and dim each colour slowly creating mood lighting for the club. Trouble was the spikes generated by the SCRs controlling the bulbs interfered with the circuit and the result was a crazy flickering effect. And the club was reopening in 3 days time..... I got called in and first of all had to draw out the circuit diagram from the boards (there was no documentation whatsoever...no idea where he got it....). It had 3 triangle wave generators and these were (wrongly) syncing themselves to the interference, producing the flickering. By a combination of inductors on the SCR lamp circuits, and regulating the op-amp supplies, I got the thing working pretty well. The guy was really chuffed and gave me a free ticket to the opening. However I was a real nerd in those days and didn't go...I also got a nice wad of cash for it though and that was much appreciated.
Sign in to Reply
Borges
10/26/2011 4:20 AM EDT
Thou shalt push as many times as thou popeth!
I learned that the hard way too when I put a task switcher into my assembly code. But luckily for me it was my own code all the time.
Sign in to Reply
Sanjib.Acharya
10/26/2011 11:36 AM EDT
I think, we can not avoid troubeshooting somebody else's design and I agree that it is tougher than designing our own something new. To be positive we have to start liking trouble-shooting a problem, taking the same as a challenge...and slowly we will find it enjoyable...then we won't be bothered that we are trouble-shooting somebody else's mess.
Sign in to Reply
ndancer
10/26/2011 12:41 PM EDT
Reminds me of the Story of Mel, a real programmer.
http://www.cs.utah.edu/~elb/folklore/mel.html
Old stuff should not be forgotten because there are useful lessons in old stuff, where we didn't have two hundred terabytes of RAM, a googleplex of hard drive, and clocks running at X-ray speed.
Sign in to Reply
sharps_eng
10/26/2011 1:48 PM EDT
We tend to solve our own problems at someone else's expense.
This story illustrated what a lot of 'real-time' programmers do; they find they can't do the job here and now (inside the ISR), so when interrupted with fresh data (frame), they store the data in some sort of queue and go back to munching away at the head of the queue, trying to keep up. So far so good, but note that it has actually used more (net) time and memory; and it relies on the interrupt rate slowing sometime to allow the queue to reduce.
To prevent an ugly overflow in the middle of a packet, downstream code will have to monitor the queue and take a higher-level decision to dump some of the data in an orderly fashion.
(This is why your TV set drops frames nowadays, when analog never did).
So you can see that completing your data decoding on-the-fly is A Good Thing, while taking a bit more thought and maybe more horsepower, makes things much easier for downstream code because overflow can never happen.
* disclaimer: many specific cases will not support this approach; they are not the ones I am talking about *
Sign in to Reply
Sheetal.Pandey
10/28/2011 5:40 AM EDT
reminds me of my projects in early years of my career. Its always fun debugging. Its like a match winning.
Sign in to Reply
Jerry.Brittingham
10/28/2011 5:52 PM EDT
Any good (emphasis on good) assembly language programmer knows to document. Descriptions of subroutines, comments on most lines of code and descriptive labels are a good start And anybody doing real time interrupt driven code always counts the clock cycles (micro/nano seconds)to insure that it will work fast enough.
Sign in to Reply
WKetel
10/28/2011 10:44 PM EDT
I have had to fix and repair other peoples work on many occasions, and sometimes it has been "interesting" indeed. The result is that I provide very good documentation of my work so that when it needs to be repaired either I, or somebody else, will be able to see how it should work, and understand what has failed. Besides that, really good documentation saves me from having to do many service calls, which is fine with me. The side benefit is that it can make the plant electricians look good, which assures that I get the help I need in the plants, when I need it.
Sign in to Reply
zeeglen
10/28/2011 11:29 PM EDT
There was an old engineering semi-joke "If it was difficult to design, it should be difficult to understand".
Looking over some of the comments here, proper and complete documentation is wonderful to your successor. However, it does make the original designer more expendable...
Sign in to Reply