CUPERTINO, Calif. — Nvidia has opened the hood on its custom 64-bit ARM core first announced in January 2011. "Denver" is an ARM processor that uses microcode to enable a novel execution optimizer.
Two cores will ship this year in an SoC that is an upgrade to Nvidia's Tegra K1, targeting tablets. The existing 32-bit chip targets Android and is used in an Acer Chromebook, Google's Project Tango tablet, Xaomi's MyPad, and Nvidia's own Shield tablet.
Nvidia clams the 64-bit Tegra K1 will sport PC-class performance in mobile systems for gaming, business apps, and content creation. Denver was nearly on par with an Intel Haswell processor and surpassed by 10 to 25% an Apple A7 series SoC in benchmarks Nvidia showed.
Nvidia only showed benchmarks against the x86 and 32-bit ARM SoCs.
The company did not give any comparisons with a standard A57 64-bit core from ARM. Targeting servers and networking gear, AMD just started to sample SoCs using the A57, and Applied Micro has started sampling its custom 64-bit ARM.
Until benchmarks against standard and custom 64-bit ARM SoCs emerge, it's not clear whether Denver will help Nvidia improve its position in mobile systems, where it significantly trails leader Qualcomm.
Denver can execute as many as seven instructions per clock, running up to a 2.5 GHz rate. It packs a 128+64 kbyte L1 cache and 2 Mbyte 16-way set associative L2 cache.
The most novel aspect of Denver is an optimized execution feature used as an alternative to a full out-of-order design. It handles a variety of optimizations such as renaming registers, unrolling loops, breaking false code dependencies, and removing unused computations.
The optimizer chains related routines and uses 128 Mbytes of main memory, securely partitioned before an operating system boots. "We see a 2x speed-up or better with optimized routines," said Darrell Boggs, chief architect on the project, speaking in a talk at the annual Hot Chips conference here.
The new core marks the end of Nvidia's use of a companion core, something it pioneered with its early 32-bit ARM SoCs. ARM continues to pursue the approach with mixed 32- and 64-bit cores.
Among other techniques, Denver can reuse memory pipelines for integer traffic, and it has a pre-fetch to compensate for cache misses.
Denver is a microcoded seven-wide superscalar 64-bit ARM.
— Rick Merritt, Silicon Valley Bureau Chief, EE Times