CUPERTINO, Calif. – Intel provided the first look inside its Xeon Phi aka Knights Corner processor in a Hot Chips paper here. The chip packs more than 50 quad-threaded Pentium-class cores with 512-bit vector units and about 25 Mbytes cache around a 512-bit, three-ring interconnect.
Xeon Phi is essentially an x86 symmetrical multiprocessing system on a chip. It runs popular programming environments used in large server clusters and supercomputers such as OpenMP, MPI, OpenCL, Pthreads and Intel’s existing tools.
The PC giant hopes the chip will displace general-purpose graphics chips increasingly used as co-processors in high-performance computing (HPC). Nvidia’s GPUs using hundreds of smaller cores and a proprietary environment called Cuda have been most successful winning such sockets to date.
Intel used Xeon Phi in an internal system called Discovery that delivers about 1,400 MFlops/watt, dissipating 72.5 kW and hitting number 150 on the latest version of the Top 500 supercomputers list. By contrast, one Nvidia-based system at number 177 on the list consumes 81.5W, Intel noted.
“My big conclusion is performance efficiency does not have to come at the expense of programmability,” said George Chrysos, a lead designer of Xeon Phi, in an interview with EE Times before Hot Chips. “It’s a myth that you need specialized programming models to get to these performance levels--you can have you cake and eat it too,” he said.
The Hot Chips paper revealed aspects of the Xeon Phi architecture. Intel won’t disclose product details or a road map until the first chip is announced later this year.
The company is expected to roll out a family of products, eventually scaling to well beyond 50 cores. Cray said it will use Xeon Phi in its next supercomputer called Cascades.
The chip’s cores have one 512-bit vector unit and two scalar units and one private 512 Kbyte L2 cache. Intel hopes the large caches help propel the chip’s use in future exascale supercomputers. The wide scalar units help crunch scientific workloads based on a variety of algorithms including FFTs and Monte Carlo simulations.
Xeon Phi began its life as Larrabee, a graphics chip made out of x86 cores. Seeing a narrowing opportunity to compete with the likes of AMD and Nvidia in mainstream graphics, Intel shifted its strategy to target massively parallel HPC systems where it hopes to be easier to use than competing GPUs.
Xeon Phi (below) adds a wide vector unit and large cache to a quad-threaded Pentium core.
Click on image to enlarge.