A famous comedian coined the phrase, "I hate it when that happens!!!" I can sympathize. I believe that I have used that phrase every time I have had to decipher, debug, or improve on Someone Else's Design.
One day, my boss gave me a task to figure out what was wrong with a VMEBus-based processor interface box that we had. Since this was back in the "Dark Ages", this box had a Motorola 68010 microprocessor, coded in Assembly language (no C, Java, or HTML here). What we had done was to take two 6RU high, custom-made, wire-wrapped, 7400 logic-based, interface boxes and roll them into one 5RU high, VMEBus box. We maintained the interfaces to the two HP1000 Fast Fortran Processors that "crunched the numbers."
This box was slick: it had a touch panel on the front to perform "Health and Status" of the processor, and display "tickers" to log data from the interfaces. The problem that this box had was the "Blivet" problem (putting 10 pounds of stuff in a 5 pound bag). The packaging, cabling, rear panel connectors, power, and cooling were all good. The problem was that with all the joy of saving rack space and so forth, the designer got a little ahead of his capabilities with the Assembly code.
The original interfaces only implemented the "L" mode. The new VMEBus design implemented both "L" and "S" modes, effectively a 4-fold increase in complexity. In the "L" mode, it extracted the "DF" and "NV" bits from the 144 bit data frame every 125 microseconds. The "L" mode was successfully implemented.
The "S" mode was a new animal, however. This mode provided one "DF" and one "NV" bit every fourth 193 bit, 125 microsecond frame. Testing this mode proved that it did not work. The tickers would update in one burst and then the box would lock up. I rather suspected a problem with the logic implemented in the Assembly code. A few inquiries proved that the designer had left the company and was unavailable to answer any questions about what he has designed.
I began by poring over the Assembly code and was pleased that the designer has done a fabulous job of documenting what he had done. The most arcane issues to resolve with Assembly code usually involve "subroutines". If you see code with "JSR" and "RTS" sprinkled throughout, you will definitely have a hard time tracking the logic. As you will soon see, subroutine accesses (JSR, RTS) also take quite a few CPU cycles to execute and this is a key parameter to control when writing in Assembler. Interrupt Service Routines (ISRs) are even slightly MORE arcane, as they will be executed anytime an external interrupt occurs.
What I eventually discovered was that most of the logic for searching every fourth frame for one "DF" and one "NV" bit was executed INSIDE the ISR; two ISRsevery 512 microseconds. NOW I figured that I was close to the problem. I took the Motorola Assembler book and started adding up the CPU cycles required to execute the ISR and I found that one ISR would not be finished before the next interrupt would occur. This would tend to keep pushing registers onto the CPU stack until it would run out of memory and lock up.
The fix was not easy. I took me more than a month to re-implement the ISR so only the critical instructions were executed inside the ISR and to set up a way to store intermediate calculations, in order for the values to be available outside the ISR (commonly, when a CPU goes off to service an interrupt, it pushes its registers onto the stack, making those values unavailable to the logic inside the ISR).
The changes were completed, tested, and the box remained in service for many years afterward. I was able to be proud of this accomplishment.
There was an old engineering semi-joke "If it was difficult to design, it should be difficult to understand".
Looking over some of the comments here, proper and complete documentation is wonderful to your successor. However, it does make the original designer more expendable...
I have had to fix and repair other peoples work on many occasions, and sometimes it has been "interesting" indeed. The result is that I provide very good documentation of my work so that when it needs to be repaired either I, or somebody else, will be able to see how it should work, and understand what has failed. Besides that, really good documentation saves me from having to do many service calls, which is fine with me. The side benefit is that it can make the plant electricians look good, which assures that I get the help I need in the plants, when I need it.
Any good (emphasis on good) assembly language programmer knows to document. Descriptions of subroutines, comments on most lines of code and descriptive labels are a good start And anybody doing real time interrupt driven code always counts the clock cycles (micro/nano seconds)to insure that it will work fast enough.
We tend to solve our own problems at someone else's expense.
This story illustrated what a lot of 'real-time' programmers do; they find they can't do the job here and now (inside the ISR), so when interrupted with fresh data (frame), they store the data in some sort of queue and go back to munching away at the head of the queue, trying to keep up. So far so good, but note that it has actually used more (net) time and memory; and it relies on the interrupt rate slowing sometime to allow the queue to reduce.
To prevent an ugly overflow in the middle of a packet, downstream code will have to monitor the queue and take a higher-level decision to dump some of the data in an orderly fashion.
(This is why your TV set drops frames nowadays, when analog never did).
So you can see that completing your data decoding on-the-fly is A Good Thing, while taking a bit more thought and maybe more horsepower, makes things much easier for downstream code because overflow can never happen.
* disclaimer: many specific cases will not support this approach; they are not the ones I am talking about *
Reminds me of the Story of Mel, a real programmer.
Old stuff should not be forgotten because there are useful lessons in old stuff, where we didn't have two hundred terabytes of RAM, a googleplex of hard drive, and clocks running at X-ray speed.
I think, we can not avoid troubeshooting somebody else's design and I agree that it is tougher than designing our own something new. To be positive we have to start liking trouble-shooting a problem, taking the same as a challenge...and slowly we will find it enjoyable...then we won't be bothered that we are trouble-shooting somebody else's mess.
Much simpler stuff...but when I was about 20 a friend's girlfriend's father owned a nightclub. He refurbished it, and got a fancy light unit that had a few hundred watts of coloured lights above a huge milky perspex ceiling. The electronics was supposed to bright and dim each colour slowly creating mood lighting for the club. Trouble was the spikes generated by the SCRs controlling the bulbs interfered with the circuit and the result was a crazy flickering effect. And the club was reopening in 3 days time..... I got called in and first of all had to draw out the circuit diagram from the boards (there was no documentation whatsoever...no idea where he got it....). It had 3 triangle wave generators and these were (wrongly) syncing themselves to the interference, producing the flickering. By a combination of inductors on the SCR lamp circuits, and regulating the op-amp supplies, I got the thing working pretty well. The guy was really chuffed and gave me a free ticket to the opening. However I was a real nerd in those days and didn't go...I also got a nice wad of cash for it though and that was much appreciated.
It's really mind-boggling to think of what my hourly rate would be if I were paid for the actual time that I spent troubleshooting a problem to get to a fix. The challenge is to avoid going down a rathole.
Join our online Radio Show on Friday 11th July starting at 2:00pm Eastern, when EETimes editor of all things fun and interesting, Max Maxfield, and embedded systems expert, Jack Ganssle, will debate as to just what is, and is not, and embedded system.