Breaking News
News & Analysis

Memory Shifts Coming, Says Keynoter

1/30/2014 08:20 PM EST
28 comments
NO RATINGS
Page 1 / 2 Next >
More Related Links
View Comments: Threaded | Newest First | Oldest First
chipmonk0
User Rank
CEO
With no specifics about their HMC at this late stage
chipmonk0   1/31/2014 12:07:42 PM
NO RATINGS
( e,g. at DesignCon ) only the worst can be assumed. Unless of course I am missing something.

RGRU
User Rank
Manager
Re: With no specifics about their HMC at this late stage
RGRU   1/31/2014 12:42:32 PM
NO RATINGS
Engineering samples of HMC were out in September of last year.  There is a website called google.com where you can locate the specs, etc...

chipmonk0
User Rank
CEO
Re: With no specifics about their HMC at this late stage
chipmonk0   1/31/2014 12:52:05 PM
NO RATINGS
how about papers at recent Techncal Conferences ?

lister1
User Rank
Rookie
120GB/s ~ 160GB/s? I'm all for it...
lister1   1/31/2014 2:10:24 PM
NO RATINGS
If Micron could bring the HMC price down to the level of DDR3, that would be very welcome to the consumer market.  Higher speed, lower power, less heat, smaller footprint, it's all for the better.

rick merritt
User Rank
Blogger
Re: 120GB/s ~ 160GB/s? I'm all for it...
rick merritt   2/1/2014 4:58:58 PM
NO RATINGS
@Lister: I'd love to hear any HMC prices if you know them?

GSMD
User Rank
Manager
Re: 120GB/s ~ 160GB/s? I'm all for it...
GSMD   2/1/2014 10:03:47 PM
NO RATINGS
Micron has a tight NDA for the HMCs. So prices are unlikely to leak in the near future. But long run, it shoudl be cheaper than regualr DDR (in terms of overall system costs) since there is a reduction on the logic component, lower pin count and reduced PCB size. We are spec'ing our server class CPUs to use only HMCs (but then my target date is 2017) since I believe, serdes base links are the way to go. A great advantage you get is sharing Physical ports with I/O channels. So you can treat a SERDES lanes as memory or I/O and switch the protocol handler inside the processor.

Probably one SERDES bank will have to be dedicated to memory but others can be switched on the fly.I have presented this at a conference too and have started host side HMC silicon implementation too. These will be released into open source. 

But we want to go one step further. It would make sense to shift the MMU to the HMC so that a HMC block can provide a section of the total system memory to the CPUs attached to it. This work well in single adress space OSs which can have fully virtual caches.

So in a sense, the HMC boots firsts, sets up a  virtual memory region and then the CPUs attach to these regions. Ance once you have logic inside DRAM, you can try out all kinds of things inside memory, data ptrefetch, virus scanning, ... The lists goes on.

I could keep going but this I hope this shows  why a SERDES based memory architecture can revolutionize system design. It is a lot more than optimizing DRAM or reducing power.

We  are doing sim. runs to see if these architectures have any merit. I am also partly sceptical (even though it is my proposal !). But I guess the sim. results will answerall the questions. 

If someone wants to have a more detailed discussion on these, send me an email. 

Note for all you patent trolls out there, with this post these ideas are hereby put  into public domain ! And this definitely constitute prior art. So no patenting.

rcsiv
User Rank
Rookie
Re: 120GB/s ~ 160GB/s? I'm all for it...
rcsiv   2/3/2014 12:33:30 PM
NO RATINGS
GSMD, I would like to follow up, what is your email address or another way to get a hold of you?

GSMD
User Rank
Manager
Re: 120GB/s ~ 160GB/s? I'm all for it...
GSMD   2/3/2014 12:38:26 PM
NO RATINGS
Best way is my linkedin page, gsmadhusudan. Also has presentations on all this.

rick merritt
User Rank
Blogger
Re: 120GB/s ~ 160GB/s? I'm all for it...
rick merritt   2/3/2014 3:15:59 PM
NO RATINGS
@GSMD: Thanks for the good perspective! How about the question of latncy being higher than DDR?

Whjat is the latency vs DDR? How do you handle that?

krisi
User Rank
CEO
spin-transfer torque RAM and phase-change
krisi   2/3/2014 5:55:21 PM
NO RATINGS
any estimates when spin-transfer torque RAM and phase-change might happen?

zewde yeraswork
User Rank
Blogger
DRAM alternatives
zewde yeraswork   1/31/2014 2:11:43 PM
NO RATINGS
Its interesting to look at the status of alternatives to DRAM among memory architectures. up to this point, they weren't getting much attention but now that they have been whittled down to just a few, those alterntives are gaining visibility and credibility as it is hard to deny what has been proven over the course of time.

Ron Neale
User Rank
Blogger
Re: DRAM alternatives Micron whittles the list
Ron Neale   1/31/2014 6:19:57 PM
NO RATINGS

It would be interesting to know in quantitative terms what "in low volume production" means and even more intriguing what does "it has its place, it's its own thing." actually mean in the light of the following.

By late December 2013 both the 128Mb and the 1Gbit MCP had been quietly removed from the Micron product list on their web site. It was reported elsewhere* that Micron had indicated that their earlier generations of phase change memory were no longer available for new designs or for those wishing to evaluate the technology and the focus for PCM had moved to developing a new PCM process, in order to lower bit costs and power while at the same time improve performance. What then is the PCM device type that is in low volume production, why would low volume production be maintained for devices that are no longer available to potential customers and have the bit cost, power and performance limitations indicated? If PCM is not suitable for NAND or DRAM replacement, for what then is it suited?

Micron also have a paper a paper co-authored with Sony at ISSCC 2014 that reports a 16Gbit ReRAM based on a 27nm process, one wonders why that not get a mention along with STT/MRAM as one of the whittled down list of emerging memory types with future potential on which Micron are working?

* http://electronics360.globalspec.com/article/3931/exclusive-micron-drops-phase-change-memory-for-now

resistion
User Rank
Manager
ULLtraDIMM already a DRAM threat from NAND
resistion   2/1/2014 2:03:43 AM
NO RATINGS
SanDisk's ULLtraDIMM seeks to replace some DRAM capacity on servers already.

resistion
User Rank
Manager
HMC's DRAM Controller
resistion   2/2/2014 12:27:59 AM
NO RATINGS
Lot of hush-hush about memory controller ownership in HMC. Intel of course wants to put all the ownership in its CPU, as would anyone who integrates an on-chip memory controller into the main processing unit. It's a big factor in chip design strategy. Designing with HMC-based controller is actually a big risk.

GSMD
User Rank
Manager
Re: HMC's DRAM Controller
GSMD   2/2/2014 12:56:08 AM
NO RATINGS
The spec is open, so am not sure what the technical risk is. HMC IP is available now from multiple vendors and you can actually build an FPGA based CPU with HMC memory controllers using Altera parts.

Since we are doing our own CPU, we are naturally building our own HMC controller which like the CPU will be open sourced.

The key risk is not technical but rather that HMC does not take off and so memory vendors may discontinue HMC parts. This is not very likely in the longer run since for greater bandwidth serial based links are the only way to go. And once go optical, SERDES based systems are your only option. So in that sense, the future of DRAM is serial link based.

Also technically, HMC controllers are not DRAM controllers but rather simple protocol handlers. The DRAM controller is essentially getting shifted to the DRAM module. I suspect like our program, the ARM server community will first shift followed reluctantly by Intel.

resistion
User Rank
Manager
Re: HMC's DRAM Controller
resistion   2/2/2014 1:07:55 AM
NO RATINGS
@GSMD, not disagreeing, but the execution timing is going to be everything.

zewde yeraswork
User Rank
Blogger
Re: HMC's DRAM Controller
zewde yeraswork   2/3/2014 12:15:28 PM
NO RATINGS
it makes sense to build an open source HMC controller just as the  CPU is open sourced.

DougInRB
User Rank
Manager
Re: HMC's DRAM Controller
DougInRB   2/3/2014 1:36:11 PM
NO RATINGS
It seems that everyone is ignoring the fact that the memory cube will have significantly higher latency than DDR-4.  A RMW will stall the CPU for eons.  This means that it cannot be used by a CPU as the main memory attached to the cache.  It essentiallty brings in a new tier to the memory hierarchy.  It seems like a great idea that will bring much higher overall memory bandwidth, but the critical latency to the CPU is not solved.

Maybe  the local DRAM will become a 4th level cache.  Maybe someday the DRAM will be displaced by MRAM.  In any case, I cannot see the DDR interface being simply replaced with a bunch of serial links.

It seems like the first niche for the memory cube would be in comm, where latency is not as big a deal and throughput is king...  You could make an amazing switch with such a device.

 

resistion
User Rank
Manager
Re: HMC-CPU connection
resistion   2/3/2014 6:26:03 PM
NO RATINGS
Good point. I guess it's supposed to be on top of CPU with TSV connection. This would also require CPU maker buy-in.

DougInRB
User Rank
Manager
Re: HMC-CPU connection
DougInRB   2/3/2014 7:18:46 PM
NO RATINGS
Even if the memory cube is directly attached to the CPU (which is a very bad idea from a manufacturing yield perspective), the latency will be higher.  To access a DRAM, you need to provide the row and column addresses and a few nanoseconds later a cache line is available.  To use a serial interface, you need to create a command packet that says "read starting at this address and give me so many bytes".  That command packet then needs to be serialized and then sent to the memory cube controller.  That has to be de-serialized and interpreted.  If the command is not for that memory cube, it has to be passed along the chain to another cube.  If it IS for that memory cube, the DRAM has to be read (same row/column read cycle, but at a higher frequency).  The data needs to be read into a buffer, then a response packet needs to be generated, serialized, and finally sent to the CPU.  Whichever thread of the CPU that was trying to do the read has had to twittle its proverbial thumbs this whole time while waiting for a cache fill to complete.  This takes a few nanoseconds with DDR and will take 10s or 100s of nanoseconds with a memory cube.

That should drag just about any high performance CPU to its knees.  If the idea is good enough, the CPU makers might be willing to reinvent the whole multi-thread, cache, and memory management infrastructure, but I kind of doubt it :-).

Like I hinted in my earlier post, this may make a great main memory as long as there is a very large low latency RAM between it and the CPU (4th level cache) - and the cache hit rate of the 4th level cache is VERY high... 

GSMD
User Rank
Manager
Re: HMC-CPU connection
GSMD   2/3/2014 10:06:04 PM
NO RATINGS
The issue of serial link latency is an interesting topic. At first glance it can appear to be pretty high but actual experiments may show that the latency can ce brought down if teh protocol is simple.

We are implementing a Serial RapidIO 3.0 IP. If you are curious the source is in bitbucket.org/casl {IIT madras Comp. Arch and systems Lab). The source published till now is the logical and transport layer. When synthesized in Synopsys DC targetted at a 65 nm library (still waiting for our 28 nm FD-SOI library), we are getting  single cycle processing for 64 bit packets and 1-1.3 cycle for 128 bit packets. This is at 2 Ghz for 64 bit packets. We are still coding the Physical layer and SERDES will be a std part.

So best case latency at 65 nm is 500ps for logical and transsport layers. have no idea what Physical layer with SERDES will be. But hopefully it willbe below 10 ns (HT was lower, PCIe is higher). Interlaken IPs are claiming 13.5 ns without the SERDES but all other layers included. Interlaken is closer to HMC in terms of traffic type than PCIe. Inphy is claiming sub 15 ns for its 10/28G serdes. Not sure if this includes PCS. Also since a lot of protcols need to be supported, a simpler serdes could shave off a couple of ns.

Worst case sceanrio is still looking like 17ns. maybe 15 ns at 20nm at 3 Ghz.

If net latency is 10ns then HMC does not look too bad. You save a ns or two due to having integrated controller for all the banks.

Master's thesis from UCB on silicon photonics optical interconnects. Has some data on latency.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182.3362&rep=rep1&type=pdf

Please take into consideration the following

1. HMC protcol will be simpler comapred to I//O protocols

2. Probably less robust ecc will also suffice since distances will be minimal on the PCB

3. We plan to link our HMC interface directly to the cache logic to see if we can reduce latency. can be done with HMC since protocol is simpler. So no DDR controller going through AXI. Please take controller access latency also into consideration when you look at conventional DDR.

To sum up, the jury is still out on latency but it does not seem to be a deal breaker. I could be wrong, but I hope not. We will know in about 4-6 months since we will be able to synthesize our entire IP minus the SERDES.

resistion
User Rank
Manager
Re: HMC-CPU connection
resistion   2/4/2014 3:26:17 AM
NO RATINGS
So no takers for HMC on CPU? The DRAM-CPU communication was supposed to be the main beneficiary of going to TSV technology.

GSMD
User Rank
Manager
Re: HMC's DRAM Controller
GSMD   2/3/2014 8:55:58 PM
NO RATINGS
Actually if you look thru the HMC architecture, the shared logic across multiple DRAM dies actually helps matters a bit. I have not studied this in detail but functions like striping can be handled transparently. In these scenarios HMCs should improve latency. Micron itself claims lower latency. As per HMC web page " Reduced Latency – With vastly more responders built into HMC, we expect lower queue delays and higher bank availability, which will provide a substantial system latency reduction". The key here is lots of banks and lots of CPUs. If you just conside one core making an access to one bank, obviously latency will be higher. But in a real life scanario where you have a 16 core monster with quad issue per core, with hyper-agressive preftech with limited spatial locality, all standard latency models go haywire loose and an HMC will help.

Besides as someone else pointed out L4 caches have started appearing and they will absorb a lot of the latency hit. IBM as usual is first off the gate. Intel is presumably next.

But before you start wrininging your hands over latency, please consider all computing scenarios. Latency problems manifest diffreently in each case. In  situations like  shared nothing databases or hadoop, a lot traffic will go over QPI(assuming an Intel box), which has higher latenccy than HMC. Besides with HMC there is zero copy. So in these scenarios, HMC can be a big win. Plan to test postgres with HMC to see if this pans out.

basically the question is if remote dram access over qpi is better than local dram access over hmc ?

Putting memory along with the CPU is a bad idea, heat and pacakaging issues will all create nasty issues. Besides the whole idea is that the HMC can act as a fabric linking multiple CPUs together, avoiding the use of QPI like interconnects. If we can embed some security/MMU logic in the HMC then secure shared memory can be achieved. This will help avoid the use of a seperate intecconect for low socket counts ( 2-4). This is my theory. let us see how it holds up in an actual implementation.

 

So what I am proposing is a Adaptive System fabric that will combine a HMC fabric with a I/O fabric like RapidIO. The CPU can transparently switch from using memory fabric to I/O fabric depending on datapath availability, projected latency and congestion. Ideally I want to share a low level protocol between the two, in which case a part of the HMC controller will itself acts as an I/O fabric switch. a truly universal fabric. latency can be an issue and has to be sorted out. But hey that is why this is called research !

 

Seiroulsy why cannot EE Times have a bigger webinar/arcticle on this issue ? This is something that will change systems arch. significantly especially when these links become Silcon Photonic. These by the way are really cool, especially the Altera FPGA part that has the optics integrated right on the FPGA. Samples are a pain to come by through. But solves the major headache of routing 28G or 56G traces on a PCB.

 

 

 

DougInRB
User Rank
Manager
Re: HMC's DRAM Controller
DougInRB   2/4/2014 11:14:07 AM
NO RATINGS
Hmmm.  Now you have me thinking about this with a new perspective.  First of all, the FPGA based systems can definitely take advantage of this.  I've designed a DDR interface for an FPGA and it is not only a pain in the butt, it also wastes the bandwidth capability of the DRAM.  By using the HMC, very few pins are needed and the latency is not a problem.  Fan-out to logic that can inhale the data at full bandwidth could be a problem but it is easily solved with wide internal buses.  Then the memory can be shared amongst all of the hardware accelerators and embedded processors...

Hello Xilinx and Altera - can you please build me a big FPGA in a smaller package?  With PCIe and HMC, I don't need all of those pins!

My other thought about an application of the HMC is for an array of small low-power, lower frequency processors (remember the transputer?).  When scaled out, this could provide a lot more compute power per sq in than the monster heater CPUs we use today.

OK - maybe I'm not as skeptical now.  Even though it is still a bad fit for conventional CPUs, it might be a good fit for compute intensive workloads that can be parallelized.

I still think that a comm application with built-in packet inspection/routing/etc. would be a great place to start.  The array of light weight processors or FPGAs might even be the right infrastructure for  this.

GSMD
User Rank
Manager
Re: HMC's DRAM Controller
GSMD   2/4/2014 11:49:42 AM
NO RATINGS
These discussions are fun aren't they, I like the different perspectives I get from them.

My focus area in using HMCs is pretty much what you are talking about. I used to be the kernel guy at an RDBMS compnay and we had 10s of thousands of threads running simultaneouly. For such a workload, a sea of processors was a great fit. Tried using an IBM SP/2 but was a pain to use. There was a transputer based system called the Meiko computing surface (you could change fabric topology dynamically) but the transputer was too lightweight. Loved the transputer though. Our ISA will use transputer style messaging instructions to send messages over Serial rapidIO (sendmg - coreid, data). Best homage I can think of !

FPGAs also can make use of HMCs the way you suggested. Once Altera sends me a sample, plan to try some sea of cores design as a master's project next year.

I was just reviewing an SMT based experimental processor design from one of my master's students. 8 simultaneous threads all stressing the fetch unit and multiply this by 64 (it is a 64 core system, each core is as heavy as a Cortex A7), you get your sea of core system that can really use an HMC.

I agree with your analysis. It is no one's case that an HMC is a universal panacea but in highly parallel applications it may be great fit inspite of the higher latency. For RDBMDS type apps, I am planning a dedicated server processor with HMC and specialized functional units dedeicated for RDBMS sub-system processing.


In real life systems arch, I think every system deserves its own dedicated architecture. 

I am also asking our FPGA contacts to give only SERDES based parts. Woudl be a great part for our CPU prototyping. But I would prefer to junk PCIe which I frankly think is an abomination as an interconnect !

DougInRB
User Rank
Manager
Re: HMC's DRAM Controller
DougInRB   2/4/2014 1:08:05 PM
NO RATINGS
In real life systems arch, I think every system deserves its own dedicated architecture.

As an engineer, I'd love to do it right from the bottom-up.  The reality is that drastic changes aren't possible.  Look at how long it took us to get multi-threaded CPUs fully supported.  First, the CPU guys had to implement it.  It took a long time after that before the compiler, OS, and application folks figured out how to take advantage of it.  This is one reason that the transputer never really got out of academia - nobody knew how to program it.  Maybe now with GPGPU architectures being embraced by the HPC folks, the time of the transputer has come - provided that somebody takes the time to generate a robust library of commonly used functions.

But I would prefer to junk PCIe which I frankly think is an abomination as an interconnect !

Junking PCIe has the same problem as I cited above - it is everywhere, and people know how to use it.  Having said that, I would love it if I didn't have to pay certain IP vendors a small fortune to use their PCIe cores.

GSMD
User Rank
Manager
Re: HMC's DRAM Controller
GSMD   2/4/2014 7:54:27 PM
NO RATINGS
This was the very reason I shfted to academia after a long stint in the industry. Tough to refuse an offer when you get to design a whole family of CPUs from scratch with no concession to backward comptability and a companion microkernel to go along with it !

Let me know if you want to try out our CPU cores, low end cores should be available in 3 months or so. We support the Xilix tool flow for FPGAs.

In the same vein we junked PCIe quite simply because it does not naturally support peer to peer (non-transparent bridging is a pain) and has no support for DSM. We use DSM to build a MESIF based CC chip to chip interconnect. I used to run the Asian operations of a company that sold PCIe, SRIO and HT cores, so know the game well. I agree that these cores are ridiculously priced. Hopefully our open source RTL will change the industry a bit and make technical merit the determinant of a standard's sucess rather than the marketing muscle of its backer.

The other team that is luck enough to do everything from scratch is the BAE/UPenn/Harvard team doing the Crash SAFE program. See www.crash-safe.org. They are doing a secure CPU, OS, two new languages and app framework from scratch.

resistion
User Rank
Manager
Internal disagreement?
resistion   2/5/2014 11:18:29 AM
NO RATINGS
"Micron's process technology experts have expressed "wild disagreement" about when a DRAM replacement will be needed. "The earliest points to 2015, and the latest points to far enough out you could call it never."

Seems inside Micron there are those who want DRAM forever, those who want MRAM, those who want PCM, those who want RRAM, those who want Flash...

Good for R&D to thrive, but bad for immediate product development..

Most Recent Comments
zeeglen
 
HardwIntr
 
Jessica Lipsky
 
Wnderer
 
Wnderer
 
LenD
 
Olaf Barheine
 
cd2012
 
Darius Pl.
Flash Poll
Radio
LATEST ARCHIVED BROADCAST
Join our online Radio Show on Friday 11th July starting at 2:00pm Eastern, when EETimes editor of all things fun and interesting, Max Maxfield, and embedded systems expert, Jack Ganssle, will debate as to just what is, and is not, and embedded system.
Like Us on Facebook

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
EE Times on Twitter
EE Times Twitter Feed
Top Comments of the Week