Troubleshooting is absolutely one of the most important skills to be a good engineer, in my opinion, to be a good manager. Question is what knowledge we all need to be a good troubleshooter. To me, the thorough understanding of system is very important. W/O understanding, I, at least, will have hard time to start. However, documentation may not be always available. Even it does, it may not cover the whole 9 yards that you will need to get the job done. So, what else? Experience. I remember the first undergraduate project years back - a class A amplifer. The first symptom is it doesn't even ampilifer any signal. After sometimes, I just found out the transistor was blown to begin with. Understanding how transistor works definitely come handy. The 2nd symptom after signal was coming out of the output - noise and signal clipping. Apparently, power supply didn't give a clean voltage and amplification was larger than calculated. With twist of resistor and building of feedback loop, the issue was relief. This is a simple project and yet, I have learned a great deal as a freshman in engineering school. Later on, with years of on the job training, when I became manager, I was able to build a checklist to troubleshoot and guide my subordinates to resolve most issues in a timely manner.
There is no substitute for gaining hands-on experience at troubleshooting failing circuits in the lab. The more strange things you see and eventually solve, the better you get at it.
It helps to know the system inside and out, and have a thorough checklist that starts with the most basic things like assembly errors and verifying correct supply voltages at all the right places.
Once you work your way through the list of most probable causes and still haven't solved the mystery, start working through the list of improbable causes. Think outside the box, and start visualizing all those invisble resistors, capacitors and inductors that aren't on the schematic. Remind yourself that although what you're observing might seem impossible -- that the circuit should never behave that way -- the fact is, circuits don't lie, and there are no displeased "electron gods." Physics governs, and when you finally find the root cause of the problem, it will seem so obvious that you will wonder why you didn't see it sooner.
A student once asked me how I learned how to debug. I hadn't really thought about it before, but after considering the question a bit I told him it was probably the many detective novels I read as a teenager. I think the best genre for this is the "police procedural", such as the Martin Beck series by Maj Sjöwall and Per Wahlöö (e.g., The Laughing Policeman) and Georges Simenon's Maigret novels. A police procedural puts you in the right frame of mind for the slow, methodical process of tracking down a bug. An anomaly has occurred. First you try to reproduce the crime. Then you have to interview all the signals and/or variables that may know something about the crime. You have to assume that some or all of them are lying to you. You have to eliminate suspects until only one is left, or else get lucky and find a key clue or get an unexpected report from an informer. You have to look for Joseph P. McGillicuddy, Lt. Dan Muldoon's code name for the suspect they don't know about -- yet -- in The Naked City (Jules Dassin, 1948). Trying to hurry the process makes you miss things. You neglect to follow up an unpromising lead that ends up solving the puzzle.
Private detective novels are also good, especially the complex Ross McDonald and Raymond Chandler novels that have so many characters that it's hard to keep them straight. Rex Stout's Nero Wolfe novels are excellent, because they also have the police procedural form, with Archie Goodwin and other operatives bringing the information to Nero Wolfe, who is the only one who can fit all the pieces together. Another favorite is John Dixon Carr, master of the "locked room" mystery.
When I have a really tough debugging problem and I can't seem to get there, I put on my hat -- an authentic English deerstalker such as is worn by Sherlock Holmes. It always puts me in the right frame of mind to reëxamine the evidence one more time and see what I missed before.
The worst thing you can do? Look at your design or code and say "it's got to work!" Obviously it doesn't, so assuming otherwise is not going to put you in the right frame of mind for debugging it.
"A student once asked me how I learned how to debug. I hadn't really thought about it before, but after considering the question a bit I told him it was probably the many detective novels I read as a teenager."
Or perhaps you were drawn to such material because you already possessed a problem-solving "gene" - a trait that I suspect may be common among engineers.
I couldn't agree more about the critical need for good troubleshooting skills in engineering, having launched several new content sections on EE Times and Design News like Engineering Investigations and Made by Monkeys, where engineers relate stories involving mysterious problems and how they were resolved. Though nothing takes the place of hands-on experience, I think that engineers can learn from the story-telling by other engineers on problems they solved. Detailing their thought process of what they tried and why and what worked and what didn't work is an important transfer of hard-earned experience.
A primer in general systems theory might be helpful. Do engineering schools provide that at all? Understanding of fundamentals like unintended consequences and the like ought to be second nature to engineering graduates!!
There are also some tricks that probably seem obvious to most readers, but things like dividing the problem temporally and physically.
SO you try and trace forward to the point where abnormal behavior first appeared...or trace backward through time all the points where the abnormal behavior existed to try and find a point of injection.
Similarly if you can cut away parts of the circuit as being OK, what is left should be the most likely source of the error.
All good Sherlock Holmes stuff. "When you have eliminated the impossible, whatever remains, however improbable, must be the truth."
And always check your assumptions.
Very often the source of the problem that escapes detection for a long time is some fact or condition that was not deemed worth checking.