Nvidia has been studying “use cases across the product line” for chip stacks, he said. It would make sense to test the technology first on a midrange GPU as one member of a more conventional family of components.
“We need to try it out in some way to hedge our bets a little bit,” Dally said. “You learn a lot when you put a new technology into a volume product, so I think we want to do it in a way that it adds a feature, but the mainstream product doesn’t depend on it,” he said.
At a recent annual conference, Nvidia chief executive Jen-Hsun Huang announced the company will ship in 2015 a next-generation graphics processor called Volta that uses stacked memories. However, he didn’t give any details about the device or the technology it will use.
The push for a 2.5-D stack on an organic substrate makes sense, said Tummula Rao, a researcher in the field at Georgia Institute of Technology. “We at Georgia Tech are doing memory stacking in organics too and planning to do 2.5-D as well,” he said.
A Georgia Tech researcher working on 3-D stacks with through silicon vias was more skeptical.
“It seems organic interposers will win in terms of cost, yield, and reliability, and silicon interposers will win on interconnect size/pitch, performance, and power,” said Lim Sung Kyu. “If the target application calls for high memory bandwidth, I am not even sure if organic interposers can even meet the requirements,” he said.
Separately, Dally said SoCs that merge CPU and graphics cores do not need the kind of cache coherent memory architecture rival AMD is helping develop as part of the Heterogeneous Systems Architecture alliance.
Instead, Nvidia will implement in its Cuda programming environment a virtual memory capability. It will use pointers and page table exceptions to create a pool of virtual memory shared by graphics chips and host CPUs. Maxwell, its next-generation graphics chip shipping in 2014, will be the first to implement the approach.
The technique will become a key capability for Nvidia’s SoCs that use ARM cores as well as Cuda-enabled GPUs starting with Tegra 5 expected to sample this year. AMD's approach will be used in future SoCs that build in its x86 and Radeon graphics cores using OpenCL.
“I have trouble thinking of any app that needs cache coherency,” Dally said. The approach “generates additional traffic on some interfaces that can become a bottleneck,” he added.