A manager's near instant analysis of a problem with an engine electrical replacement part-done with no analysis or data-is wrong, of course
We had a customer that manufactured an aftermarket engine electrical replacement part. Because this part was safety related they wanted to test every part that was produced, and save the test results. At that time, saving such a quantity of data on a standard industrial PLC was an expensive proposition. Fortunately we had a relatively new option, which was to use an IBM PC to control the system and generate the test results records.
Our customer believed that the required data could be saved on a floppy disk and archived after every shift, and it wound up that they were correct. It seems now that data must have been smaller in that era, since the disks were of the 1.2Mb size, small by current standards. But the computer was fast enough to control the test and collect data, even when the tester had two independent test stations running.
Because of the required production rate and the length of time the test required, this was a two station machine. The operator would load the part in one test fixture, press the clamp buttons, and then load the other fixture and press a second set of start buttons for that station. The system also had to work properly if the operator loaded parts in both stations and then quickly pressed the two sets of start buttons in sequence. This meant that the two tests, while identical, had to run completely independently from each other, and not share any resources such as analog inputs, which were a bit expensive at that time.
The product tested was first checked for short circuits between motor windings, then for proper coil resistance, and if those tests passed a functional test was done, and accepted parts would unclamp, ready to be removed. A failed part would require the operator to press the unclamp button to release the failed part. The data recorded for each part was: station#, resistance of each winding, and total motion distance for the functional test. These values were also displayed on the operators screen, as an aid to process control, and for machine diagnostics.
When the system was built, it functioned as expected, at least for the initial testing. Each station had resistance readings that were within 1% of our measured values, after calibration, and both stations would read within 0.5% of each other. We were quite pleased at this. Then came the problem, which appeared when we ran parts in both stations.
With both stations testing at the same time there would be large, non-identical errors in the resistance readings, several percent off from what the part resistance actually was. Each station running by itself still always produced correct readings. This was not acceptable, since we were outside our accuracy specification, and far outside our repeatability specification. My manager quickly decided that the problem was a transient spike from operating the solenoid air valves for the clamp mechanism. I was very impressed by his nearly instant analysis of the problem, done without any examination or data.
Since the problem was judged to be a transient spike, the next course of action was twofold, putting transient suppressors on all the solenoid and real coils, and when that did not deliver a solution, adding an isolation transformer for all of the instrumentation power, which also did not solve the problem.
All the relatively quick fixes were used up and the problem had not changed at all. So now I spent several days attempting to capture the transient on a storage scope, often with the vice president in charge of sales standing right behind me to help me concentrate. Unfortunately, there was no evidence of any transient to be seen. So now it was time to talk with the programmer, a contract guy who had done programming for us on several occasions. The analog reading routine took a set of 64 readings and then averaged them, to reduce the effects of any AC pickup in the stand wiring.
So I asked for the option of displaying all of those readings on the screen, instead of just the averaged value. The next day he came and loaded in the new code that would provide that function. It was arranged to display the values after the test was completed, which was easy, since the results array was not reset until the start of the next test. We tried it out by running one station, and discovered that all of the readings were the same.
Evidently there was not much noise in the wiring. I had not thought that there would be. Now all that I had to do was to run parts until I produced the error again, which only took a few minutes. I pressed the key to display the array of readings, and I immediately saw the problem, which was that when the second test started, all of the values were suddenly those from the second station, instead of the values from the first station. I ran testing a few more times and each time the error occurred the situation was similar. Now I could see the error, but I did not understand exactly why, just yet.
I called the programmer and described my findings to him, and as I finished the description, he told me that he knew exactly what the fault was, and that I would have the fix the next day. Of course, the next morning he arrived with a new program disk, and a half hour later the problem was gone. It seems that his code had used the same variable name for both stations, and when the multi-tasking software went between the two tests, the incorrect data was carried along. It was “a path-dependent multitasking error,” not an electrical transient. My manager expressed his disappointment that I had not seen that immediately, and had wasted so much time.
I ran the system for the rest of the day just to verify that there were no more problems, and that evening left a message for my customer that his machine was ready. He arrived the next day after lunch and did his evaluation testing. After a while, he pulled out some parts that had out-of-limits resistance values, and ran those through a few times. He later explained that our readings had been so repetitive that he had become suspicious, and needed to verify that it was actually testing the parts. I explained to him that I had made the system a bit more accurate than they had specified in order to avoid any arguments about accuracy and repeatability. He said that was an interesting attitude.
We shipped the machine and they did not have any problems with it, and it ran until the market for that part dried up, quite a few years later.
Contributing writer William Ketel is a hands-on electrical engineer who enjoys troubleshooting and diagnostics and ham radio (extra class). His industrial machine projects range fro, an evaporator valve calibrator to a brake drum inspection machine to crash sled controls, and a package to calibrate developmental crash sensors.