CUPERTINO, Calif. – Intel provided the first look inside its Xeon Phi aka Knights Corner processor in a Hot Chips paper here. The chip packs more than 50 quad-threaded Pentium-class cores with 512-bit vector units and about 25 Mbytes cache around a 512-bit, three-ring interconnect.
Xeon Phi is essentially an x86 symmetrical multiprocessing system on a chip. It runs popular programming environments used in large server clusters and supercomputers such as OpenMP, MPI, OpenCL, Pthreads and Intel’s existing tools.
The PC giant hopes the chip will displace general-purpose graphics chips increasingly used as co-processors in high-performance computing (HPC). Nvidia’s GPUs using hundreds of smaller cores and a proprietary environment called Cuda have been most successful winning such sockets to date.
Intel used Xeon Phi in an internal system called Discovery that delivers about 1,400 MFlops/watt, dissipating 72.5 kW and hitting number 150 on the latest version of the Top 500 supercomputers list. By contrast, one Nvidia-based system at number 177 on the list consumes 81.5W, Intel noted.
“My big conclusion is performance efficiency does not have to come at the expense of programmability,” said George Chrysos, a lead designer of Xeon Phi, in an interview with EE Times before Hot Chips. “It’s a myth that you need specialized programming models to get to these performance levels--you can have you cake and eat it too,” he said.
The Hot Chips paper revealed aspects of the Xeon Phi architecture. Intel won’t disclose product details or a road map until the first chip is announced later this year.
The company is expected to roll out a family of products, eventually scaling to well beyond 50 cores. Cray said it will use Xeon Phi in its next supercomputer called Cascades.
The chip’s cores have one 512-bit vector unit and two scalar units and one private 512 Kbyte L2 cache. Intel hopes the large caches help propel the chip’s use in future exascale supercomputers. The wide scalar units help crunch scientific workloads based on a variety of algorithms including FFTs and Monte Carlo simulations.
Xeon Phi began its life as Larrabee, a graphics chip made out of x86 cores. Seeing a narrowing opportunity to compete with the likes of AMD and Nvidia in mainstream graphics, Intel shifted its strategy to target massively parallel HPC systems where it hopes to be easier to use than competing GPUs.
Xeon Phi (below) adds a wide vector unit and large cache to a quad-threaded Pentium core.
On the otherhand if this board came out as a PCIe based card that would run Open CL and have an aggressive price point, I might buy a few to do BOINC distributed computing work. The video cards I currently use has a large thermal dissipated power consumption but good output, unlike the microprocessors that host them.
But their performance would have to be better than AMD's and Nvidia's current performance at a competitive cost/performance ratio.
Remember that Larrabee is still a graphics based design. An OpenCL comparision would be truly useful independent of the host CPU.
That will likely never happen.
Some of the numbers in the article are wrong based on the source listed:
Intel United States Discovery - Intel Cluster, Xeon E5-2670 8C 2.600GHz, Infiniband FDR, Intel MIC / 2012 Intel
Cores Rmax Rpeak Power
9800 118.60 180.99 100.8
Barcelona Supercomputing Center
Spain Bullx B505, Xeon E5649 6C 2.53GHz, Infiniband QDR, NVIDIA 2090 / 2011 Bull
Cores Rmax Rpeak Power
5544 103.20 182.88 81.5
The Power for the Barcelona is dead on... but their system shows 100.8Kw and not the 72.5Kw they quote. Unless they've made some changes since the test was performed in June. In which case the Barcelona system may be even lower.
If you scale up the Barcelona system to 9800 processors, it's Rmax is in the 182 range at an unknown power consumption.
I also don't know the relative performance difference of an E5-2670 versus a Xeon E5649.
I think there are too many differences to make a fair comparison.
Any day I welcome a CPU that can show performance gains that can beat the CPU+GPU combo... BUT I fully resonate with what @jeffreyrdiamond is saying!! Don't pass the burden of learning new languages like CUDA / OpenCL to app developers, we have our own headaches to worry about. I too prefer a higher language, ideally the same one my app is developed with!
Xeon Phi is cool, but unsure of Intel's claims here. 1.4 GF/watt isn't impressive these days, and if programming is done with SSE and pThreads/OpenMP, it hardly seems any easier than CUDA or OpenCL. Does Intel have a higher level programming language in mind that still gets good performance?
Join our online Radio Show on Friday 11th July starting at 2:00pm Eastern, when EETimes editor of all things fun and interesting, Max Maxfield, and embedded systems expert, Jack Ganssle, will debate as to just what is, and is not, and embedded system.