The industry needs to put more effort into building better mobile benchmarks. The recent AndEBench-Pro is a step in the right direction, but more work is needed.
It's ironic that the issues that deserve the most industry cooperation tend to be the most contentious ones. Few issues have been more contentious than performance benchmarking. Slowly, we're moving in the right direction, but more work is needed
Benchmarks are widely used for evaluating anything electronic. To get the best scores, silicon and system vendors aggressively "optimize" for their target benchmarks. Sometimes these optimizations are more like manipulations. The technical press is littered with stories of unfair benchmarking practices, and what is reported is only a small portion of the common practices.
Benchmarks face other limitations, too. The rapid pace of innovation often makes it a challenge to accurately test all functions of a system in ways that reflect real user experiences across a wide variety of platforms. Image capture and editing, for example, may be handled by a variety of chips and APIs, frustrating efforts to make meaningful comparisons across Android, iOS, and Windows phones.
A good benchmark has five elements, the first and most fundamental of which is transparency. Benchmarks can be little more than black boxes that return a numerical result with little visibility into the code or process being executed, the data used, or the method of scoring. Often it's not even clear what functions or standards are being tested.
Benchmark owners often justify their secrecy, saying the test itself is their intellectual property or uses their proprietary information. We believe benchmarks should be developed in a manner approved by a diverse industry council, and/or all code and scoring methods should be open for review. Also, benchmarks should include a checklist of the functions and standards being tested.
A second trait of a good benchmark is it can be independently verified. Usually the benchmarking organization certifies its own results before they are published. Alternatively, users or vendors can upload and conduct some benchmarks themselves. Although system tweaks such as overclocking can bias these results, outliers usually get discounted by taking an average of scores from many users.
Third, some form of oversight is required to ensure consistency in the benchmarking process. Optimization should not be tailored to the benchmarking process. All platforms should follow the same testing procedures.
Fourth, a true benchmark, especially a mobile one, should perform some form of system-level testing. Certain components -- such as the CPU, GPU, and memory -- are easy to single out and test. Others, such as wireless network connectivity, sensor performance, battery life, and display functions, are difficult to quantify. The ultimate test of any mobile device is the user experience, something a good benchmark must at least try to represent.
One other trait of good benchmarks is they get updated on a regular cadence, ideally annually. The industry should stop using benchmarks that do not meet these criteria.
Given the many thorny issues, it has become accepted as a manner of best-practices to use a suite of benchmarks in evaluating any technology or platform. This often leads to a long list of benchmarks -- some better than others -- that show a variety of results. Users can get frustrated when presented with a varied laundry list of results.
Next page: High marks for AndEBench-Pro