BDTI has released independent benchmark results for the Cortex-A8, ARM's highest-performance processor core, on the BDTI DSP Kernel Benchmarks™ and the BDTI Video Encoder and Decoder Benchmarks™. The results indicate that the Cortex-A8 is significantly faster than its predecessor, the ARM1176, giving it considerable horsepower for its targeted applications. Initially, the Cortex-A8 is being used in chips for high-performance cellular handsets; it also targets set-top boxes, printers, and automotive infotainment applications.
Due to the cost- and energy-sensitive nature of cellular handsets, the Cortex-A8 is intended to be implemented using either the typical logic synthesis methodology (commonly used with licensable processor cores) or a semi-custom design style. Initial licensees creating highly optimized implementations of the Cortex-A8 are using hand-crafted library cells and other physical-level optimizations (as Texas Instruments has done with its OMAP3430 chip) for improvements in both frequency and power over traditional synthesis methodologies. For this reason, BDTI's benchmark results for the ARM Cortex-A8 do not include clock speed, silicon area, and power consumption data based on BDTI's standardized conditions for processor cores, and caution should be used in interpreting the Cortex-A8 benchmark results and in comparing the Cortex-A8 to other BDTI-benchmarked cores. (All other BDTI benchmark results for licensable processor cores assume a TSMC CL013G process with ARM Artisan Sage-X library and worst-case temperature, process, and voltage variations.)
The Cortex-A8 achieves a BDTIsimMark2000/MHz score of 7.6. The ARM1176 achieves a BDTIsimMark2000™ score of 1200 at 335 MHz, or 3.6 BDTIsimMark2000/MHz. (A higher BDTIsimMark2000 score indicates a faster processor.) This shows that the Cortex-A8 is significantly faster than the ARM1176 on typical signal processing tasks at an equivalent clock speed. This boost in horsepower mainly derives from the NEON signal processing extensions, which allow the Cortex-A8 to execute up to four 16-bit multiply-accumulate instructions per cycle (versus two for the ARM11). In addition, BDTI expects that, due to licensees' use of more advanced fabrication processes and hand-optimized layouts, typical Cortex-A8 implementations will achieve somewhat higher clock speeds than typical implementations of other licensable cores, further boosting Cortex-A8 performance relative to the ARM11 and other BDTI-benchmarked cores on signal processing tasks. (See ARM's and TI's estimates for Cortex-A8 clock speed.)
(Click to enlarge)
Table 1. Cortex-A8 performance on BDTI's Video Encoder and Decoder Benchmarks™
On the BDTI Video Encoder and Decoder Benchmarks (see Table 1 and detailed results), the Cortex-A8 requires less than half the loading of the ARM1176.
For more BDTI results and analysis of the Cortex-A8, see the full article at InsideDSP.