MOUNTAIN VIEW, Calif.--While graphics processors, or GPUs, are certainly well known in the world of gaming and advanced graphics, the question of whether they can be used to accelerate computation within supercomputers is much more recent.
At the recent Supercomputing 2011 show in Seattle, keynote speaker Jen-Hsun Huang, CEO of Nvidia Corp. said GPU technology was an essential ingredient on the path to reaching exascale computing within a 20MW power envelope, but Intel has strongly disagreed.
Intel Corp. is pushing forward its own version of parallel architecture, in the form of Many Integrated Cores (MIC), which it says will be easier for programmers to use and for the industry to scale.
“We’ve built a very general purpose device, one that’s designed for Parallelism,” said James Reinders, and HPC software specialist at Intel.
Reinders admitted that while people seemed excited at the notion of acclerators being added on to supercomputers, especially GPUs, that wasn’t necessarily the best approach.
Instead, Reinders posited, it may be better to go for already widely used x86 cores which have been designed for data parallelism and which are much more programmable and “even more exciting on the performance side.”
Unlike Nvidia, Reinders said, Intel was not dedicating part of its design or performance to graphics. “When you look at a pure data parallelism workload, we have a huge advantage,” he claimed.
“It’s an x86 device, you can do anything on it programming wise that you can do on any of our programmers,” he added.
While Reinders conceded that 50 cores would not be “a snap to use” for every programmer, and would require something of a learning curve, he did say people were usually “amazed” at how easy it was to pick up.
Thus the question becomes not, how does one make a program scale to 50 cores, but does it need 50 cores to run it in the first place?
Take a look at the video below and let us know your thoughts on Intel’s challenge to GPU acceleration.
So Sylvie, a newbie are you ? I've been watching rapid CPU advances as an IC designer for almost 40 years. While I certainly don't expect a MIC-in-workstation in 2012, you can bet it'll get there at some point. 22nm is arriving in just a few months. While a supercomputer will use over a thousand MICs to generate 1 petaflop, you only need 1 in a Xeon-based workstation today to produce 1 teraflop. I think there are plenty of labs that could use this. While a single MIC workstation isn't likely a big priority for Intel now, you can bet it's on the list in the next year or 2. Adding a single co-processor wouldn't be hard and it's already supported by software.
Well, I am a systems as well as a HW guy. Both MIC and GPU first bring raw data into memory and it does not matter whether a core or GPU processes it, it must be taken from memory. The video refers this data movement as a problem for GPU only. This is typical sales hype. A better approach is to bring the raw data into LOCAL memory and do the processing in the GPU or some other PU, preferably one programmed in openCL. the only data movement is processed data into main memory. Yes, a work unit must be passed to a GPU along with the raw data, but since the same processing is applied to different data over and over, the code should reside in local memory, eliminating that memory transfer.
Let's see - a bunch of 386 cores with no DMA or onboard I/O except for PCIe. No seperate buses to connect the cores (just shared memory). No cost amortization from the graphic business. Yeah, sounds like a real winner.
I could care less about GPUs -- I'm a HW guy. If the FPGA vendors would drop their prices -- I could design cost competitive accelerators that would run circles around these multicore heaters.
I'd be surprised if Intel didn't offer MIC in 2012: it'll probably be a card of similar size, price and power dissipation as nvidia tesla. whether Intel offers a line similar to gforce (that is, "desktop-priced" for $300 rather than $3000) remains to be seen.
I'm not sure why you'd swap a server for it though: it's not designed for server workloads. it's designed for compute-intensive stuff.