Nvidia has been studying “use cases across the product line” for chip stacks, he said. It would make sense to test the technology first on a midrange GPU as one member of a more conventional family of components.
“We need to try it out in some way to hedge our bets a little bit,” Dally said. “You learn a lot when you put a new technology into a volume product, so I think we want to do it in a way that it adds a feature, but the mainstream product doesn’t depend on it,” he said.
At a recent annual conference, Nvidia chief executive Jen-Hsun Huang announced the company will ship in 2015 a next-generation graphics processor called Volta that uses stacked memories. However, he didn’t give any details about the device or the technology it will use.
The push for a 2.5-D stack on an organic substrate makes sense, said Tummula Rao, a researcher in the field at Georgia Institute of Technology. “We at Georgia Tech are doing memory stacking in organics too and planning to do 2.5-D as well,” he said.
A Georgia Tech researcher working on 3-D stacks with through silicon vias was more skeptical.
“It seems organic interposers will win in terms of cost, yield, and reliability, and silicon interposers will win on interconnect size/pitch, performance, and power,” said Lim Sung Kyu. “If the target application calls for high memory bandwidth, I am not even sure if organic interposers can even meet the requirements,” he said.
Instead, Nvidia will implement in its Cuda programming environment a virtual memory capability. It will use pointers and page table exceptions to create a pool of virtual memory shared by graphics chips and host CPUs. Maxwell, its next-generation graphics chip shipping in 2014, will be the first to implement the approach.
The technique will become a key capability for Nvidia’s SoCs that use ARM cores as well as Cuda-enabled GPUs starting with Tegra 5 expected to sample this year. AMD's approach will be used in future SoCs that build in its x86 and Radeon graphics cores using OpenCL.
“I have trouble thinking of any app that needs cache coherency,” Dally said. The approach “generates additional traffic on some interfaces that can become a bottleneck,” he added.
It will be very interesting to see what impact integrated 22m graphics have on external graphics in PCs.
However, Nvidia already is far ahead in the emerging smartphone and tablet markets than Intel is with its Medfield SoCs.
I used to respect Dally as an academic, but I think he's drinking way too much koolaid these days. 2.5d is on everyone's roadmap - it's a no-brainer, not to say that it'll be trivial to accomplish. the only thing that's surprising here is that we're not seeing more early-adopter 2.5d products (you know: products that are clearly flawed but intended to provide the vendor with lessons for rapid follow-up products).
Developing an I/O standard in vaccum is not going to help unless internal chip to chip within NVDIA. If there is no standards, then there is no memory to stack. Once it is a standard, many will do it. Not sure how this will benfit..I may be missing somethng.
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.