So Sylvie, a newbie are you ? I've been watching rapid CPU advances as an IC designer for almost 40 years. While I certainly don't expect a MIC-in-workstation in 2012, you can bet it'll get there at some point. 22nm is arriving in just a few months. While a supercomputer will use over a thousand MICs to generate 1 petaflop, you only need 1 in a Xeon-based workstation today to produce 1 teraflop. I think there are plenty of labs that could use this. While a single MIC workstation isn't likely a big priority for Intel now, you can bet it's on the list in the next year or 2. Adding a single co-processor wouldn't be hard and it's already supported by software.
I'd be surprised if Intel didn't offer MIC in 2012: it'll probably be a card of similar size, price and power dissipation as nvidia tesla. whether Intel offers a line similar to gforce (that is, "desktop-priced" for $300 rather than $3000) remains to be seen.
I'm not sure why you'd swap a server for it though: it's not designed for server workloads. it's designed for compute-intensive stuff.
Let's see - a bunch of 386 cores with no DMA or onboard I/O except for PCIe. No seperate buses to connect the cores (just shared memory). No cost amortization from the graphic business. Yeah, sounds like a real winner.
I could care less about GPUs -- I'm a HW guy. If the FPGA vendors would drop their prices -- I could design cost competitive accelerators that would run circles around these multicore heaters.
Well, I am a systems as well as a HW guy. Both MIC and GPU first bring raw data into memory and it does not matter whether a core or GPU processes it, it must be taken from memory. The video refers this data movement as a problem for GPU only. This is typical sales hype. A better approach is to bring the raw data into LOCAL memory and do the processing in the GPU or some other PU, preferably one programmed in openCL. the only data movement is processed data into main memory. Yes, a work unit must be passed to a GPU along with the raw data, but since the same processing is applied to different data over and over, the code should reside in local memory, eliminating that memory transfer.