To ease programming, the design is cache coherent across both graphics and traditional processor cores. Indeed, finding ways to program many-core processors is one of the chief challenges for today's computer scientists.
"We are about to see a sea change in programming models," said Dally. "In high performance computing we went from vectorized Fortran to MPI and now we need a new programming model for the next decade or so," he said.
"We think it should be an evolution of [Nvidia's] CUDA," said Dally. "But there are CUDA like approaches such as OpenCL, OpenMP and [Microsoft's] DirectCompute or a whole new language," he said.
All the languages use similar ingredients. For example, they try to build into their semantics support for advanced memory sharing mechanisms.
Nvidia's Echelon system will compete with teams from Intel, MIT and Sandia National Labs, each taking different approaches to build power efficient exascale systems.
The Ubiquitous High Performance Computing program is sponsored by the Defense Advanced Research Projects Agency. DARPA tasked the teams to build by 2014 a prototype petaflop-class system into 57 kilowatt rack prototype computer. Such systems could be used as building blocks to create an exascale system to be built by 2018.
Nvidia's Echelon chip packs 1,000 cores in 128 blocks