Really an amazing interface. But this raises questions: considering that optimized systems which use cache , run around 10-50 floating point instructions per memory word access. This means that a fitting cpu should run around 100-500 Tera flops per second !
This is clearly impossible with any of today architectures and moore's law current state.
Alex, great comment. You are confirming that interposers (2.5D) and vertically stacked dice (3D) are ready to change the system-architecture paradigm. Until now, System Architects are limited by the MEMORY WALL, because current solutions either 1) don't allow to put enough memory into the SoC, right next to the CPU -- because of cost or 2) don't offer sufficient bandwidth or short-enough latency to external memory -- because of power- and cooling constraints.
As your comment highlights, current architectures, if using 2.5D or 3D technology, will run into a CPU WALL and designers can forget about the dreaded memory wall. Highly parallel architectures (I know of a design with a 10,000 bit wide bus) will replace current architectures in high-performance and memory-intensive applications.
Alex, please let me know when and how I may talk to you more about this topic. I am looking for an experienced system designer to convey above benefits clearly and in a compelling way to the system design community -- to trigger new architectures utilizing the 2.5D and 3D-IC strengths.