I've seen that problem several times over the years, and it's always been a timing problem related to interactions of hardware design and software design.
One occurred when the one side didn't have enough time to process received data before the other side requested to send more, and things ended in a loop where the one side sent an ENQ (Enquiry or request to send data) and the other side only had time to send a WACK (Wait ACKnowledge) before the first side repeated the ENQ, and so on ad infinitum I don't remember how they fixed that.
A second occurred when a new minicomputer-based (probably all TTL) RJE (Remote Job Entry) terminal kept dropping data. The vendor had their top programmer on site trying to fix it. You could recognize him by the flannel shirt, blue jeans, and lost look on his face as he travelled between the RJE and a card punch an back, for what seemed like weeks on end. We were finally asked to come in with a datascope and saw that as soon as the RJE sent the acknowledment for the last block of data, the mainframe shoved another ENQ down its throat, but the RJE didn't see it because it was still processing the last block it received. It turns out the mainframe programmers had set the mainframe FEP (Front-End Processor) for full duplex, figuring it was faster and more efficient, and both modems were set for constant carrier. The RJE, however, needed to be set for half duplex because it needed breathing room. It was fixed when we convinced the system programmers to set the mainframe for half duplex and we reset the modems for switched carrier with a 250 ms turnon time. With a quarter-second delay the RJE was happy, and so was its programmer!
A third one was also a timing issue, only this time a microprocesser- based barcode reader was stuffing things down the mainframe's throat too fast. The group in charge of furniture inventory bought this neat little barcode reader to scan barcodes on furniture and sent it to the mainframe all in one fell swoop. It worked great at the demo at another agency, using a larger IBM mainframe that we had. On our system the mainframe kept dropping data. Part of the problem was that the barcode reader blurted out all 64K bytes of data without stopping, and they had set the end of record as a carriage return. The mainframe was set to sense a carriage return as end of data, and would terminate the read and go on to the next step in its program. Ours was a slower mainframe, so by the time it got its act together and hung another read up, several records had gone past and into the bit bucket. Part of the problem was that the application programmer read a record and then went on to process it, including disk access, before reading the next record, which took a lot of time (relative to the datacomm line speed). It was fixed by changing the barcode reader to terminate each record with a Record Separator (RS) character and not send a carriage return until all 64K had been sent (many teeth were pulled as we interrogated the vendor progammer over the phone). Then one of our datacomm systems programmers (to whom a macro-assembler was a high-level language) set the mainframe to continuously read data into a gargantuan buffer until it read the final carriage return that terminated the read. The next step was to split out the records based on the RS character and then pass that list to the remainder of the program.
"You have to understand how a starship operates." -- Capt. Kirk, Star Trek: The Wrath of Kahn.
NASA's Orion Flight Software Production Systems Manager Darrel G. Raines joins Planet Analog Editor Steve Taranovich and Embedded.com Editor Max Maxfield to talk about embedded flight software used in Orion Spacecraft, part of NASA's Mars mission. Live radio show and live chat. Get your questions ready.
Brought to you by