A hardware engineer and a software engineer struggle to maintain composure and objectivity during a prolonged debug effort that ends with irony
I was the software engineer, Jeff was the hardware engineer. The two of us designed a single board computer about 10 years ago using a Dallas Semiconductor 8051 variant as the 8 bit microcontroller for the brains in a programmable radar jammer for the U. S. Navy. We were in the integration phase; the previous steps had gone fairly smoothly, with the expected glitches and surprises with the breadboard ironed out, and it was time to test all aspects of the hardware with the final versions of the 8051 software, FPGA firmware and the production board. We hooked up the RS-232 cable and used HyperTerminal’s file transfer function on the PC to download the latest executable file into the 8051 via a simple bootloader that I had written with no file transfer protocol, just as we had been doing for weeks. The bootloader saved the executable image into the board’s PROM and we were ready to go.
We were running the software’s various functions when we started to get anomalous results in some hardware that had worked just fine the week before. After muttering the usual, “But that’s not possible,” we fired up the standard test equipment. The logic analyzer and oscilloscope were our primary tools to check signals and strobes, logic and timing while the microcontroller was replaced with the Nohau pod.
Jeff tried error traps in the FPGA, I tried breakpoints and watches with the 8051. All our efforts proved useless; everything checked fine. We put the microcontroller back in the socket, and the bad behavior resurfaced. I checked and rechecked the software image, ensuring that the debug pod was using the same executable that was being downloaded over the serial port into the PROM. We even replaced the 8051.
Everything checked, but the problem remained when the program was run out of the Dallas chip in the socket. What really confused us was that an earlier version of the executable worked fine in the chip, and the errors we now encountered could not come from the slight change in the software; in fact, the only change to the program had been to expand hard coded error strings for the 8051 to send error notification to the operator at the terminal.
Three days went by, with both Jeff and I struggling to maintain composure and refraining from pointing fingers (“It’s got to be the hardware.” “No, it’s must be the software.”) We had never come so close to blows in the 20 years that we had been working together. I finally gave up trying to fix the problem and downloaded an earlier version of the image just to see if we could get back to where we were. It worked. Jeff packed up and went home. His parting words were “It’s software. Call me when you figure it out.”
We had violated a major debugging rule: “Before reacting, put down the tools, step away from the bench, and think!” Since the only change in software had been the addition of some error strings in the operator interface, our thinking had been that the software could not possibly be the source of this strange hardware behavior.
I spent the day studying source code before I decided to compare what was being sent over the RS-232 and being stored in the PROM against the image file on the PC. Imagine my consternation when I discovered the two did not match! There were a number of places where the hex sequence “0D0A” was truncated to “0D.”
It turned out that HyperTerminal was stripping the linefeed when it encountered a carriage return / line feed combination, sending only the carriage return when it encountered “0D0A.” The hex 0D0A combination occurred in the instruction address of the executable file, and we had never encountered the problem before because the program had never been large enough to have an instruction address at hex 0D0A.
Adding the error strings pushed us to where we had an address 0D0A. When HyperTerminal converted to “0D0A” to “0D,” the program went to the wrong instruction! There had been no problem with the software, no problem with the hardware. The problem had been with a tool that had always behaved as expected before, and the reactionary behavior of the engineers involved. Again, a debugging rule had been violated: “Know your inputs.” We had assumed that what was in the PROM was what we sent over the RS-232 connection.
There was no way that I could figure out how to get Hyperterminal to perform as desired without putting XModem (or a similar protocol) into the bootloader. I ended up writing a file transfer program in Visual Basic that we used on the PC to transfer the executable image into the Dallas 8051, and everything worked fine. Years later, we still enjoy the story of the “D0A” debug session, marveling at the irony; we were “DOA” because of a lack of “D0A” (0D0A)!
Lawrence Koepke graduated from U. C. Davis and worked in the tech industry for several years before starting his software business "Cybersoft" in 1995. In 2002, the business changed its name to "ATE Magic" to reflect the move from commercial software to embedded systems. ATE Magic supplies software and microcontroller based designs for embedded systems in the radar and communications fields, as well as designing and implementing automated test systems for the resultant products.
Jeff Tindall graduated from California State University at Chico, followed with a Master's in E.E. from Santa Clara University. After a number of years working for high tech business in the Electronic Warfare business he started his own business as Tindall Technologies in 1990, designing and building programmable radar jamming subsystems for the U. S. Navy and Airforce, and other radar subsystems for other customers. Tindall Technologies was purchased by Teledyne in 2007.