A lot has been written recently about the practice of boosting benchmarks scores on Android. Much of what has been going on is without doubt cheating that's not worth the trouble. Now that it has been exposed, let's hope it stops -- but I wouldn't hold my breath.
Some observers of this unsavory behavior have concluded that we have reached the end of the line for performance optimizations on Android devices. Not so. There are still plenty of generic and platform-specific optimizations in Android to improve the real user experience if you know what you are doing. Such improvements are possible primarily because Android applications are typically running on top of a very complex set of libraries, virtual machines, and just-in-time compilers.
There are actually plenty of Android platform and product companies who do not engage in pointless benchmark-rigging practices. We know because they contractually bind us not to do benchmark “specials” for them.
Any optimization that is only invoked when a specific benchmark is run and that is not accessible through the normal operation of the device is a cheat. Optimizations improving the benchmark performance are perfectly acceptable if available and useful to the general operation of the device.
Benchmarks exist not just to allow comparisons of one device with another, but also as metrics of the overall user experience. There is a true virtuous circle at work here if everyone plays the game.
Benchmarks must represent the general characteristics of real user activity and genuine optimizations that impact their scores. They will inevitably also deliver real user experience benefits. Benchmark developers must simulate different categories of real user activity. Moreover, to keep everyone honest, they should ensure that the internal profile of the benchmark is pretty flat and realistic.
The recently updated AnTuTu benchmark floating point test has changed from a hot-spot perspective. Version 4 has a much tighter loop kernel whereby about 80 percent of the runtime is spent in a loops comprising of only 18 instructions in total (see graph below). That means it is more open to abuse by overly specific optimizations that might not benefit the real user experience.
Some observers suggest benchmarks should evolve to be more difficult to boost, and analysts should get smarter about detecting cheating. Certainly these things will happen, but I think this would be a fairly negative outcome.
Instead I think we should focus on making that virtuous circle work: Analyze and promote benchmarks that encourage generic optimization as the true measures of an Android platform’s performance and then implement those real optimizations to provide genuine differentiation. We can be better, can’t we?