News & Analysis
Comment
kjdsfkjdshfkdshfvc
Maxwell, with ARM 64bit cores included,probably Neon SIMD ?, and using the ...
ChuckLippmeier
Well, I finally received a response with the information I requested. If this ...
Nvidia describes 10 teraflops processor
Rick Merritt
11/17/2010 10:47 AM EST
SAN JOSE, Calif. – Nvidia's chief scientist gave attendees at Supercomputing 2010 a sneak peak of a future graphics chip that will power an exascale computer. Nvidia is competing with three other teams to build such a system by 2018 in a program funded by the U.S. Department of Defense.
Nvidia's so-called Echelon system is just a paper design backed up by simulations, so it could change radically before it gets built. Elements of its chip designs ultimately are expected to show up across the company's portfolio of handheld to supercomputer graphics products.
"If you can do a really good job computing at one scale you can do it at another," said Bill Dally, Nvidia's chief scientist who is heading up the Echelon project. "Our focus at Nvidia is on performance per watt [across all products], and we are starting to reuse designs across the spectrum from Tegra to Tesla chips," he said.
In his talk, Dally described a graphics core that can process a floating point operation using just 10 picojoules of power, down from 200 picojoules on Nvidia's current Fermi chips. Eight of the cores would be packaged on a single streaming multiprocessor (SM) and 128 of the SMs would be packed into one chip.
The result would be a thousand-core graphics chip with each core capable of handling four double precision floating-point operations per clock cycle—the equivalent of 10 teraflops on a chip. A chip with just eight of the cores would someday power a handset, Dally said.
The Echelon chip packs just twice as many cores as today's high-end Nvidia GPUs. However, today's cores handle just one double precision floating-point operation per cycle, compared to four for the Echelon chip.
Many of the advances in the chip come from its use of memory. The Echelon chip will use 256 Mbytes of SRAM memory that can be dynamically configured to meet the needs of an application.
For example, the SRAM could be broken up into as many as six levels of cache, each of a variable size. At the lowest level each core would have its own private cache.
The goal is to get data as close to processing elements as possible to reduce the need to move data around the chip, wasting energy. Thus SMs would have a hierarchy of processor registers that could be matched to locations in cache levels. In addition, the chip would have broadcast mechanisms so that the results of one task could be shared with any nodes that needed that data.


ChuckLippmeier
11/17/2010 6:31 PM EST
Ya so what. I've been trying to get ahold of a sales person at the NVIDIA webstore site for two weeks and I've been sent on wild goose chases to PTC and Microsoft service organizations, neither of which knows anything about NVIDIA. I'm not interested in anything NVIDIA has to say anymore.
Sign in to Reply
ChuckLippmeier
11/18/2010 2:47 PM EST
Well, I finally received a response with the information I requested. If this posting was the reason, thank you but it's a heck of a way to get into NVIDIA's sales/Tech support.
Sign in to Reply
vivekv80
11/18/2010 1:23 PM EST
awesome is this Kepler or Maxwell? Hope they allow DMA and GPUs will make a mark in embedded processing :)
Sign in to Reply
kjdsfkjdshfkdshfvc
12/3/2012 9:33 AM EST
Maxwell, with ARM 64bit cores included,probably Neon SIMD ?, and using the existing ARM one Terabit of usable system bandwidth per second interconnect
http://www.arm.com/products/system-ip/interconnect/corelink-ccn-504-cache-coherent-network.php
and its also probably why your only now seeing Intel talk about their proposed one Terabit Non cache coherent interconnect in an upcoming paper, as they missed that ARM innovation to start with :)
i do find it a little odd that Intel are not making use of their in house "Light Peak" optical fiber research here though, didn't they manage to get it on-chip YET by now and cheaper... if only for the higher speed bus and not the ultra low power optical information processing yet.
Sign in to Reply