Speaker and especially headphone transducers deteriorate rapidly with age due to temperature and humidity effects as well as fatigue. This deterioration doesn't have a direct relationship to original specifications or cost.
This is why a transducer that ages gracefully can sound much better to the ears than one that doesn't.
Most published speaker measurements lie like a rug.
A speaker enclosure has 3 fundamental dimensions which will result in resonances in multiples of the half wavelength and multiples right up the audio spectrum. The Q of these resonances will typically range from 10 to 20. If one dimension is the same or a multiple of the other, the resonances are boosted. To evenly spread the peaks up the audio spectrum the fundamental dimensions of the enclosure should relate as the cube root of two.
To avoid revealing these nasty resonant peaks, speakers are typically tested with pink noise so the high Q resonances have no chance to build. Unfortunately music consists of frequencies held for significant time and the resonances will obviously color the music to the ear.
Every manufacturer wants his specs to look good. Testing realistically would significantly worsen the numbers. So most speaker and amplifier specifications are meaningless to real world performance.
So if ear tests don't correspond to instrument tests, you're not testing honestly.
How can you be certain that you are measurung the correct things, and you know what "Accurate" really means wrt sound quality? Here are a few historic examples where listeners uncovered gaps in the measurement methodologies available at the time:
Early SS amplifiers measured better than their tube conterparts, but many audio buffs found the sound to be clearly inferior. The culprit? Very high levels of crossover distortion having high peak/average ratios. Clearly audible. THD analyzers in the day notched out the fundamental and measured the average or RMS level of the residual. Thus a favorable objective measurement of a subjectively bad amplifier. We now know to look at distortion harmonic spectra at all amplifier power output levels and can easily uncover these flaws.
Jitter is another example of an audible defect uncovered by careful listeners and widely panned by many engineers as unmeasureable and so irrelevant. It wasn't until John Meyer developed and published a measurement methodology and started to correleate jitter with defects in reproduced sound that people started to take notice.
Historically these issues have been solved by savvy engineers and trained listeners who work together to achieve a beneficial result. Sometimes the engineer/listener is the same person, sometimes not.
The use of "audiofool" terminology is deragatory and prejuducial wrt to a group of individuals that historically have been able to uncover issues with audio reproduction that the then current engineering measurement methodologies were unable to uncover.
I'm not defending green pens, mpingo discs, quantum sinks, and the like. Far from it.
I'm suggesting there is a middle ground and broad brushed dismissal of the audibility of unmeasurable differences has been an historically false position to hold.
Being involved in instrumentation design, I was always a bit skeptical that a certain amplifier or speaker could sound better or worse than than measured results.
One day on a whim, I hooked up my distortion meters to the actual speaker terminals of my living room stereo, and I was in for a rude awakening. The distortion readings were a couple of orders higher than for a pure resistor load! Not only that, but the frequency response drooped quite significantly above about 6Khz using lamp cord.
When I tested various other power amplifiers, I found that their performance under a speaker load bore no resemblance to their performance under a resistor load.
Performance also varied wildly with the particular speaker systems used. A full range electrostatic by Quad made many amplifiers go unstable and most perform very badly. Not surprisingly, the Quad amplifier seemed unaffected by actual speaker loads.
Speaker cables did have a significant effect on frequency response and distortion.
So if ear tests don't correspond with instrument tests, you're not testing properly.
I listen mainly to music that was originally generated using non-electro-acoustic sources.
So I too would like perfectly accurate sound (unless it be from the echo-affected seats in the Royal Albert Hall, in which case I would happily accept a measure of echo cancellation.
The situation could be different for "audiophiles" who do not listen to live, acoustically-generated music. I would prefer to call these people audiorasts (aka pederasts).
Unfortunately, no part of the reproduction chain is perfect, and this can easily lead to specification difficulty. For example, some of the best systems I have heard use speakers have total output acoustic power that correlates very well over the frequency range with on-axis density - but the intrinsic frequency response is not that flat; accordingly the amplifier frequency response has to be modified (with quite a fine resolution) to correct the overall response*. We are indeed degrading the amplifier to correct for the defects in the speaker - measurements of either would not look good on paper.
But the hardware is not the only problem - some "sound engineers" modify the microphone balance and even frequency response during the course of a piece of music. Even where they only do this where the featured instrumentalist is silent (i.e. in order to reduce background noise) you get a distracting change in the reproduced acoustic; but it also seems that some sound engineers think they know better than the conductors. BBC proms producers please note.
*In case anyone thinks otherwise, I am not referring here to any widely advertised system.