News & Analysis
Comment
palf
My engineering career began in 1970 and I was using the 8086 in 1976. It's ...
Lee Harrison
The ratio of brand advocacy to informed commentary seems extraordinarily high in ...
Intel unveils 1 TFLOP/s Knight's Corner
Sylvie Barak
11/16/2011 1:51 AM EST
SEATTLE—At the SUPERCOMPUTING 2011 show here Tuesday (Nov. 15), Intel Corp. showed off its 22-nm "Knight's Corner" co-processor compute accelerator, boasting over 50 cores and delivering a purported one teraflops of double precision floating point performance, outstripping Nvidia’s Tesla 2090 accelerator.
Showing off the very first silicon available, Intel’s Rajeeb Hazra, general manager of technical computing at the Intel datacenter and connected systems group said the single 1TFLOP/s chip was the equivalent of the entire AsciiRed system built back in 1997, consisting of 9298 Pentium II Xeon processors.
That system made up 72 cabinets of computing power.

“That was a very proud day for us, but today we can get a teraflop of sustained double precision performance in one 3-D tri-gate 22-nm chip running Linux,” said Hazra adding, “It’s not on Powerpoint, it’s a real chip, we have it in our labs and it’s working.”
.jpg)
Intel later showed off a makeshift system running the chip to select press, though very few additional specs were given out.
“We don’t know of any other mainstream architecture chip with this kind of performance,” said Hazra, explaining that the main benefits lay in the extreme programmability of the chip.
“The programming model story is clear to us,” he said noting that all the software tools being used on Xeons today would be able to scale to Knight’s Corner with minimal effort, giving Intel an advantage over rivals like Nvidia, which requires code to be adapted and ported before being accelerated on a GPU.
Intel’s MIC architecture also has the advantage of having been specifically designed to process highly parallel workloads, said Hazra.
“It’s a significant day. We are so excited about taking this architecture to market,” he said.
In addition, Hazra spoke briefly about Intel’s exascale efforts, saying the firm had set itself a firm goal of reaching the target by 2018, within a 20MW power envelope. To do so, he said, however, would require a large amount of investment and partnership.
“It’s not just a question of money, it’s a question of getting the right brains and eyes looking at solving the issues,” he said.
Showing off the very first silicon available, Intel’s Rajeeb Hazra, general manager of technical computing at the Intel datacenter and connected systems group said the single 1TFLOP/s chip was the equivalent of the entire AsciiRed system built back in 1997, consisting of 9298 Pentium II Xeon processors.
That system made up 72 cabinets of computing power.

“That was a very proud day for us, but today we can get a teraflop of sustained double precision performance in one 3-D tri-gate 22-nm chip running Linux,” said Hazra adding, “It’s not on Powerpoint, it’s a real chip, we have it in our labs and it’s working.”
.jpg)
Intel later showed off a makeshift system running the chip to select press, though very few additional specs were given out.
“We don’t know of any other mainstream architecture chip with this kind of performance,” said Hazra, explaining that the main benefits lay in the extreme programmability of the chip.
“The programming model story is clear to us,” he said noting that all the software tools being used on Xeons today would be able to scale to Knight’s Corner with minimal effort, giving Intel an advantage over rivals like Nvidia, which requires code to be adapted and ported before being accelerated on a GPU.
Intel’s MIC architecture also has the advantage of having been specifically designed to process highly parallel workloads, said Hazra.
“It’s a significant day. We are so excited about taking this architecture to market,” he said.
In addition, Hazra spoke briefly about Intel’s exascale efforts, saying the firm had set itself a firm goal of reaching the target by 2018, within a 20MW power envelope. To do so, he said, however, would require a large amount of investment and partnership.
“It’s not just a question of money, it’s a question of getting the right brains and eyes looking at solving the issues,” he said.
Navigate to related information


bobbytsai
11/16/2011 11:15 AM EST
so intel finally made it to 1 Tflops. been able to buy this level of perforance for 2 or 3 years in a single slot pcie card. what is the flops/watt, flops/$ installed (floor space / cooling ...), memory bandwidth, max concurrent threds ?
Sign in to Reply
askubel
11/16/2011 2:20 PM EST
Not a fair comparison. Knight's Corner achieves 1 TFLOP running x86 instructions.
Sign in to Reply
bobbytsai
11/16/2011 5:20 PM EST
who care anymore what instruction set is used in HPC. when programming in C / C++ / cuda / opencl / java / perl / and a few hundered programming languages ... all that is abstracted away. code will have to be ported in either case (from intel single thread to intel multi-thread, MIC or OPENCL/CUDA). if you are going to go through the effort of porting, ISA is not that important a factor. install cost, operating cost, tools availabity, feature support, perf/$$ are more improtant. don't buy the intel hype. MIC is still just a research project @ intel. You can actually buy AMD and NVIDIA products with 3rd party tools support.
Sign in to Reply
drewm1980
11/16/2011 6:39 PM EST
For HPC, very little has been "abstracted away" since C was invented. Only the first four languages you list (plus Fortran) are actually used for HPC. Intel is promising the ability to use one (mature!) language/compiler/toolchain for both CPU and GPU/MIC code. This wouldn't be practical if the hardware wasn't x86(ish) ISA.
Sign in to Reply
wilber_xbox
11/16/2011 1:47 PM EST
this is really a success for the 3D tri-gate technology. It just opens up real possibilities for lower nodes than 22nm.
Sign in to Reply
p_g
11/16/2011 7:03 PM EST
Agreed. This is a good milestone establishing yieldable silicon on such a great size as well as bringing 3D transistor to production worthy silicon.
Sign in to Reply
goafrit
11/16/2011 7:30 PM EST
It simply means Intel owns the next decade. Too bad for AMD.
Sign in to Reply
wilber_xbox
11/17/2011 7:15 AM EST
Intel might face some competition but in my opinion they have just gone too far to catch in terms of the technology. Now the question is how they would capture the imagination of the consumers to gain control of the tablet/ultrabook segments.
Sign in to Reply
resistion
11/17/2011 8:21 AM EST
I think such high performance is not really targeted for consumers, but for enterprise servers. And since power was not remarked, I expect it to be typical ~200 W.
Sign in to Reply
BobsUrUncle
11/16/2011 7:24 PM EST
These are old Pentium III cores. No instruction level parallelism, no out of order execution, etc. Only I/O interface is PCIe. Only advantage this has is the tool chain and that you can prototype your code on normal multicore x86 workstations and move to MIC later. Plus, the cores can work independently. GPU cores can't really work on separate processes. They are too interconnected.
Sign in to Reply
resistion
11/16/2011 7:44 PM EST
"within a 20MW power envelope" per chip? No thanks.
Sign in to Reply
jaybus
11/17/2011 8:30 AM EST
I think you misunderstood the statement. He meant racks of servers using many chips, not a single chip. Their goal is a machine with exaflop, (ie. 1 million teraflop), performace.
Sign in to Reply
resistion
11/17/2011 8:38 AM EST
Of course, it has to be lots of hot chips requiring lots of cooling. I mentioned a reference of ~200 W/chip just a while ago.
Sign in to Reply
KB3001
11/17/2011 8:46 AM EST
And what's the performance per $ and performance per watt? Intel can't keep banking on x86 forever!
Sign in to Reply
snowboard9
11/18/2011 9:34 AM EST
They can , they have and they should.
Sign in to Reply
palf
12/14/2011 3:10 PM EST
My engineering career began in 1970 and I was using the 8086 in 1976. It's always somewhat of an odd feeling to still see the x86 label being referenced. I would never have imagined it back then. Kudos to Intel for sustaining the product line.
Sign in to Reply
timemerchant
11/17/2011 7:18 PM EST
Impressed with the wine rack in the background. Is this in a restaurant or a nicely fitted out exec's office? Seeing a chip in the air with a silkscreened logo but no decent specs is like deciding which bottle to pull from the rack without reading a label. I suppose "actual silicon with specs to follow" is better than "two years of specs with silicon to follow" from Xilinx, Altera etc. At least the wine will improve by the time it hits mainstream. Initial specs are impressive, even this sceptic must admit. Well done Intel, particularly on that tired x86 architecture. Heat dissipation? Must be less than what can be pulled out through a heatsink to prevent solder reflow, so to the 20MW per chip quip above, unlikely. Even 200 Watts, look at those city nightlines and switch off two light bulbs, then get back to work.
Sign in to Reply
SylvieBarak
11/17/2011 8:09 PM EST
wow, good eye, TimeMerchant! Not his office, no, it was the wine cellar of a restaurant in Seattle where the briefing was being held.
He wouldn't give out any details about the wattage per chip, but it certainly wasn't 20MW! That is the theoretical aim for an exascale system running these chips....
He also wasn't specific on the number of cores, sticking to the "more than 50" party line... but my bet is 64 cores with some being deactivated in order to improve yields.
Sign in to Reply
resistion
11/17/2011 9:05 PM EST
If they need a million such chips to achieve the ultimate goal anyway, it really takes away from the meaning of replacing ~10000 old processors with a single new one.
Sign in to Reply
snowboard9
11/18/2011 9:37 AM EST
Here Tuesday's and Thursday's?
Sign in to Reply
TFCSD
11/17/2011 11:33 PM EST
I can't wait to see how this will improve solitaire on my wintel box! Exaflop is now within reach, sweet. I worked on weather forecasting software and can use the JIT forecasting this will allow.
Sign in to Reply
Charles.Desassure
11/20/2011 10:07 AM EST
Hats off to you Intel...another success story.
Sign in to Reply
Lee Harrison
11/21/2011 9:00 PM EST
The ratio of brand advocacy to informed commentary seems extraordinarily high in many of the sentiments above.
Knights Corner gets most of those flops from a very wide SIMD micro-architecture. I happen to really LIKE SIMD micro-arches, and have done quite a lot of programming for them, and from what I see of the nLRBI (that's what the instruction set was called when the device was "Larrabee") it appears to be a very-well thought out SIMD ISA, far better than SSE.
But the claim that ordinary "scalar" procedural programs written in C, Fortran etc are automatically going to be accelerated to Tflops ... simply isn't so.
If you can't exploit the SIMD width efficiently ... its a 2-issue x86 core which isn't all that different from Atom. It's the SIMD extensions that make this design "powerful." Auto-vectorizing compilers haven't lived up to the hype so far (for any microarch ... GPGPUs included).
AMD advocacy is misplaced here, because so far as I know, AMD isn't trying to compete in specialized HPC processors and/or adjunct accelerators. The competition is IBM with its spectrum of Cell/Power7/BlueGeneQ processors, and to some extent the nVidia Kepler+ARM initiative.
It's going to be an interesting competition ... I wouldn't make any predictions of success. Folks should remember that both Cell and Power7 have successively not "conquered the HPC world," and for those thinking that Intel has avoided such experiences .. remember Itanium? Or for that matter that Knight's Corner is an updated Larrabee?
Sign in to Reply