It seems that everyone is ignoring the fact that the memory cube will have significantly higher latency than DDR-4. A RMW will stall the CPU for eons. This means that it cannot be used by a CPU as the main memory attached to the cache. It essentiallty brings in a new tier to the memory hierarchy. It seems like a great idea that will bring much higher overall memory bandwidth, but the critical latency to the CPU is not solved.
Maybe the local DRAM will become a 4th level cache. Maybe someday the DRAM will be displaced by MRAM. In any case, I cannot see the DDR interface being simply replaced with a bunch of serial links.
It seems like the first niche for the memory cube would be in comm, where latency is not as big a deal and throughput is king... You could make an amazing switch with such a device.
The spec is open, so am not sure what the technical risk is. HMC IP is available now from multiple vendors and you can actually build an FPGA based CPU with HMC memory controllers using Altera parts.
Since we are doing our own CPU, we are naturally building our own HMC controller which like the CPU will be open sourced.
The key risk is not technical but rather that HMC does not take off and so memory vendors may discontinue HMC parts. This is not very likely in the longer run since for greater bandwidth serial based links are the only way to go. And once go optical, SERDES based systems are your only option. So in that sense, the future of DRAM is serial link based.
Also technically, HMC controllers are not DRAM controllers but rather simple protocol handlers. The DRAM controller is essentially getting shifted to the DRAM module. I suspect like our program, the ARM server community will first shift followed reluctantly by Intel.
Lot of hush-hush about memory controller ownership in HMC. Intel of course wants to put all the ownership in its CPU, as would anyone who integrates an on-chip memory controller into the main processing unit. It's a big factor in chip design strategy. Designing with HMC-based controller is actually a big risk.
Micron has a tight NDA for the HMCs. So prices are unlikely to leak in the near future. But long run, it shoudl be cheaper than regualr DDR (in terms of overall system costs) since there is a reduction on the logic component, lower pin count and reduced PCB size. We are spec'ing our server class CPUs to use only HMCs (but then my target date is 2017) since I believe, serdes base links are the way to go. A great advantage you get is sharing Physical ports with I/O channels. So you can treat a SERDES lanes as memory or I/O and switch the protocol handler inside the processor.
Probably one SERDES bank will have to be dedicated to memory but others can be switched on the fly.I have presented this at a conference too and have started host side HMC silicon implementation too. These will be released into open source.
But we want to go one step further. It would make sense to shift the MMU to the HMC so that a HMC block can provide a section of the total system memory to the CPUs attached to it. This work well in single adress space OSs which can have fully virtual caches.
So in a sense, the HMC boots firsts, sets up a virtual memory region and then the CPUs attach to these regions. Ance once you have logic inside DRAM, you can try out all kinds of things inside memory, data ptrefetch, virus scanning, ... The lists goes on.
I could keep going but this I hope this shows why a SERDES based memory architecture can revolutionize system design. It is a lot more than optimizing DRAM or reducing power.
We are doing sim. runs to see if these architectures have any merit. I am also partly sceptical (even though it is my proposal !). But I guess the sim. results will answerall the questions.
If someone wants to have a more detailed discussion on these, send me an email.
Note for all you patent trolls out there, with this post these ideas are hereby put into public domain ! And this definitely constitute prior art. So no patenting.