A famous comedian coined the phrase, "I hate it when that happens!!!" I can sympathize. I believe that I have used that phrase every time I have had to decipher, debug, or improve on Someone Else's Design.
One day, my boss gave me a task to figure out what was wrong with a VMEBus-based processor interface box that we had. Since this was back in the "Dark Ages", this box had a Motorola 68010 microprocessor, coded in Assembly language (no C, Java, or HTML here). What we had done was to take two 6RU high, custom-made, wire-wrapped, 7400 logic-based, interface boxes and roll them into one 5RU high, VMEBus box. We maintained the interfaces to the two HP1000 Fast Fortran Processors that "crunched the numbers."
This box was slick: it had a touch panel on the front to perform "Health and Status" of the processor, and display "tickers" to log data from the interfaces. The problem that this box had was the "Blivet" problem (putting 10 pounds of stuff in a 5 pound bag). The packaging, cabling, rear panel connectors, power, and cooling were all good. The problem was that with all the joy of saving rack space and so forth, the designer got a little ahead of his capabilities with the Assembly code.
The original interfaces only implemented the "L" mode. The new VMEBus design implemented both "L" and "S" modes, effectively a 4-fold increase in complexity. In the "L" mode, it extracted the "DF" and "NV" bits from the 144 bit data frame every 125 microseconds. The "L" mode was successfully implemented.
The "S" mode was a new animal, however. This mode provided one "DF" and one "NV" bit every fourth 193 bit, 125 microsecond frame. Testing this mode proved that it did not work. The tickers would update in one burst and then the box would lock up. I rather suspected a problem with the logic implemented in the Assembly code. A few inquiries proved that the designer had left the company and was unavailable to answer any questions about what he has designed.
I began by poring over the Assembly code and was pleased that the designer has done a fabulous job of documenting what he had done. The most arcane issues to resolve with Assembly code usually involve "subroutines". If you see code with "JSR" and "RTS" sprinkled throughout, you will definitely have a hard time tracking the logic. As you will soon see, subroutine accesses (JSR, RTS) also take quite a few CPU cycles to execute and this is a key parameter to control when writing in Assembler. Interrupt Service Routines (ISRs) are even slightly MORE arcane, as they will be executed anytime an external interrupt occurs.
What I eventually discovered was that most of the logic for searching every fourth frame for one "DF" and one "NV" bit was executed INSIDE the ISR; two ISRsevery 512 microseconds. NOW I figured that I was close to the problem. I took the Motorola Assembler book and started adding up the CPU cycles required to execute the ISR and I found that one ISR would not be finished before the next interrupt would occur. This would tend to keep pushing registers onto the CPU stack until it would run out of memory and lock up.
The fix was not easy. I took me more than a month to re-implement the ISR so only the critical instructions were executed inside the ISR and to set up a way to store intermediate calculations, in order for the values to be available outside the ISR (commonly, when a CPU goes off to service an interrupt, it pushes its registers onto the stack, making those values unavailable to the logic inside the ISR).
The changes were completed, tested, and the box remained in service for many years afterward. I was able to be proud of this accomplishment.
Describe a memorable experience in which you solved a
baffling technical problem, involving irate bosses or customers (or
your best investigative work and we’ll pay you $100 if we
Questions? Email Brian Fuller
or Naomi Price.