News & Analysis
Nvidia: ARM supercomputer to be more efficient than x86
Sylvie Barak
12/6/2011 12:07 PM EST
SAN FRANCISCO--An ARM CPU is inherently more efficient than an x86 CPU and therefore best suited toward the high performance computing needs of the future, according to Nvidia Corp.
In a recent interview, Nvidia’s Sumit Gupta, director of Tesla marketing, said the only real advantage to x86 systems was that they could run operating systems like Microsoft Windows faster, but that when it came to needing maximum performance on minimum power, ARM was the future, and therefore a better option for supercomputing.
ARM architecture, explained Gupta, emerged out of the embedded space, where power limitations were prevalent and where less than a watt of power was considered a norm. All performance was therefore constrained from the conceptual phase of the chip’s design, forcing engineers to be especially creative about power efficiencies.
Intel and AMD’s x86 architecture, on the other hand, had been designed with PCs in mind, and came from a world in which machines were typically plugged in to wall sockets and faced no real power limitations.
“The number one consideration for x86 has always been to make operating systems like Windows run much faster and to be able to respond to unpredictable tasks, such as a mouse-click or a keyboard entry,” said Gupta, noting that the need for branch prediction and speculative execution was the reason x86 processors had such sizeable cache.
“It’s a terrific processor for everyday computing, not the right device as we go towards high performance computing,” he maintained.
Nvidia is already helping the Barcelona Supercomputing Center (BSC) to develop a hybrid supercomputer based on its Tegra ARM CPUs, accelerated by CUDA-supporting Tesla GPUs, with hopes of reaching exascale performance in a European project known as “Mont-Blanc”.
The hybrid will be the world's first ARM-based CPU/GPU supercomputing combination, and researchers at BSC have said they hope to achieve a short term goal of a two to five times improvement in energy efficiency compared with today's most efficient systems, with an ultimate goal of reaching exascale at 15 to 30 times less power.
Should the proof of concept work, Nvidia may well prove its point, but success seems a few years away at this point. In the meanwhile, Nvidia said it will continue working on a development board for the HPC community which the firm hopes will kickstart the software ecosystem around the ARM architecture for the supercomputers of the future.
Navigate to related information


chanj
12/6/2011 1:01 PM EST
“It’s a terrific processor for everyday computing, not the right device as we go towards high performance computing,”
His statement seems to redefine high performance computing to energy efficient computing.
Sign in to Reply
Patrick Van Oosterwijck
12/6/2011 2:39 PM EST
High performance computing IS energy efficient computing. At the scale we're talking nowadays, the best way to allow supercomputers to be faster is by reducing their power consumption and heat dissipation. Those are the factors limiting you from throwing in more computing resources.
Sign in to Reply
the lavender fan
12/6/2011 4:14 PM EST
I'm not so sure. It all boils down to the performance per Watt of energy consumed. Is there any data for a fair comparison between ARM and x86?
Sign in to Reply
polylith
12/6/2011 1:23 PM EST
Yeah, out of order execution, branch prediction, etc are just for users clicking their mice. You don't need those for REAL high performance computing :)
Sign in to Reply
bobbytsai
12/6/2011 4:09 PM EST
all cortex a9 arm processors have out of order execution and branch prediction already. cortex a15 will also be super scalar. cortex a8 is dual in-order instruction issue. most vendors also include SIMD units in there arm offerings. no much a intel processor has on these except 5-10x perf/watt
Sign in to Reply
bobbytsai
12/6/2011 4:20 PM EST
correction : intel has 1/5-1/10 perf/watt
Sign in to Reply
the lavender fan
12/6/2011 8:14 PM EST
Where do these numbers come from?
Sign in to Reply
markhahn
12/6/2011 10:51 PM EST
gpus manage 1-2 Tf for about 300W, or ~3-6 Gf/W. dedicated HPC chips like in the K machine or BG/q are about the same (say 2-3 Gf/W). current x86 processors manage .5-1.5 Gf/W. (numbers are a bit fuzzy - chip vs system dissipation, etc)
the recent Calxeda ARM chips seem to be about 3 Gf/W, too. (assuming 1.5W/core, 1.2 GHz and 4 flops/cycle. might be half that, can't tell.)
Sign in to Reply
iniewski
12/6/2011 2:14 PM EST
Is there any supercomputer build using ARM processors? Kris
Sign in to Reply
SylvieBarak
12/6/2011 3:04 PM EST
There is one being built at the moment, yes. By the Barcelona Supercomputing Center. But it's not built yet.
http://www.eetimes.com/electronics-news/4230570/Spain-Nvidia-plan-ARM-based-supercomputer
Sign in to Reply
iniewski
12/6/2011 6:22 PM EST
thank you Sylvie...we would be interested in having a talk on Barcelona design at emerging technologies conference in Vancouver in 2012? www.cmoset.com, would you be interested by any chance? Kris
Sign in to Reply
SylvieBarak
12/6/2011 7:16 PM EST
Of course! Drop me an email to my first name, dot, last name at UBM dot com
Sign in to Reply
Bert22306
12/6/2011 5:38 PM EST
Very interesting. Since supercomputers are all about mega-multicores, it would seem that there is a tradeoff between designing in more energy efficient cores, vs perhaps fewer cores that are better able to manage unpredictable tasks.
Sign in to Reply
markhahn
12/6/2011 10:00 PM EST
where did you get that idea? supercomputers are traditionally about _balance_, which tends to run against extreme core counts.
in fact, the push for many, lower-powered cores is precisely motivated by power consideration, works _against_ unpredictable workloads.
Sign in to Reply
Bert22306
12/7/2011 3:27 PM EST
Well, let's see. The supercomputer used by NASA to discover Earth-like planets has 50,000 cores. I'm assuming it helps if unpredictable tasks can be managed more easily, in this sort of architecture. That core count sounds pretty extreme to me, although I suppose "extreme" is a relative term.
If extreme core counts are not involved then you'd still expect there to be a tradeoff between fewer, higher performing cores, as opposed to more, lower performing, but also lower power consuming cores.
But in general, I'l seeing a lot of arm waving going on here, me included. No one is offering specifics about the difference in the ARM vs x86 architecture. So I'm speculating only based on the popular press reports and common sense.
Sign in to Reply
markhahn
12/7/2011 4:35 PM EST
sorry, I thought you meant core-counts-per-chip - that is, that HPC was pushing to more cores per node. sure, large clusters have lots of cores, since they have lots of nodes. it's not like this is optional: ambitious computing has necessitated for decades.
Sign in to Reply
mike655mm
12/6/2011 6:07 PM EST
I'm siding with Intel on this one. They've been successful for over 40 years and they keep evolving and adapting. 22nm process technology is going to be a big winner with a lot less power and a much smaller chip size (cheaper). ARM will lose most of the power advantage they used to have and as technology continues to march towards 16nm, 10nm, etc, it'll no longer be a factor. It'll be about features, ease-of-use and performance
Sign in to Reply
iniewski
12/6/2011 6:16 PM EST
I am not sure I agree Mike...yes, the process technology has been always helping Intel, so will 22nm process...but architecturally ARM is superior...is it an open question who will prevail 2-3 years from now, right now Intel is increasing their market share and revenue growth is really impressive! Kris
Sign in to Reply
Steven_Wu
12/6/2011 7:07 PM EST
I don't know it is safe to say architecturally ARM is superior. Is there any fundmental architecture difference between ARM and other RISCs?
The ISA doesn't matter. Basically business model counts.
Sign in to Reply
y_sasaki
12/6/2011 7:48 PM EST
Because of its PC-based business model, x86 processors are spellbound to binary compatibility. Intel have to design processors to be able to run binary code written for 2 or 3 generations before - not only just "able" but fast and efficiently, because PC users will evaluate new processor performance with older generation of benchmark code.
I believe intel can produce highly optimized high-performance processor, perhaps even better than ARM guys, but pressure from their mainstream PC market will not easily allow to do so.
Sign in to Reply
markhahn
12/6/2011 10:12 PM EST
in reality, the ISA has shifted with each generation. yes, adding to an ISA is messier than starting from scratch each time, but ARM is not pure and fresh, either. GPUs are probably the winner by this metric, since with, eg, cuda, apps are insulated by intermediate PTX code.
Sign in to Reply
markhahn
12/6/2011 10:05 PM EST
you're right: ARM is a fairly conventional ISA, though it's cleaner than x86. there must be some power savings in decode, but the processors have to eventually _do_ almost the same thing. (this argument doesn't hold as well comparing to GPUs, since their programming model restructures the code significantly.)
Sign in to Reply
iniewski
12/6/2011 7:25 PM EST
Intel has a better technology Steven, always one or two generation ahead. Intel has better marketing, much larger budget than any other processor makers. So why would ARM exists at all if it didn't have a better architecture? Kris
Sign in to Reply
rbarraud
12/6/2011 8:31 PM EST
So to extend your logic, any non-x86 architecture that exists, is superior to x86? ;-)
Sign in to Reply
iniewski
12/6/2011 8:44 PM EST
Obviously Intel has to maintain x86 backwards compatibility which limits its ability to innovate going forward...so every non-x86 architecture has a chance to be better but that is not guaranteed...in case of ARM I believe that the market has spoken clearly, just check where ARM was 5 or 10 years ago...Kris
Sign in to Reply
the lavender fan
12/6/2011 8:13 PM EST
Based on what criteria are you claiming that ARM architecture is superior? Intel and ARM make different tradeoffs when designing their processor cores. Usually Intel is more aggressive with performance, while ARM is more aggressive on power efficiency.
Sign in to Reply
panzerboy
12/12/2011 1:43 AM EST
I recently looked at some x86-64 code and was shocked at the number of push and pop instructions. Intel still only have 4 general purpose registers. All 16 registers on ARM are general purpose though you'd be silly to use r13-r16 (stack, link, program counter). That means more stuff in registers less pushing and popping. Just one example of how ARM is a more efficient design.
Sign in to Reply
digital_dreamer
12/7/2011 6:06 AM EST
I know the Cortex M3 and M4 MCUs are fabbed at 90nm and they still have excellent power savings. I can only imagine what power efficiency they would have at 22nm, even with leakage becoming a more dominate factor.
What are the ARM A9 and coming A15 being fabbed at, anyone?
MAJ
Sign in to Reply
bobbytsai
12/7/2011 3:28 PM EST
http://www.eetimes.com/electronics-news/4231043/Samsung-samples-dual-core-A15-processor
most A9 are in 40/45nm.
Sign in to Reply
Patrick Van Oosterwijck
12/7/2011 9:43 AM EST
Intel will always have the disadvantage of having to translate its vintage x86 CISC instructions into pipeline-able micro-ops. This is something ARM does not have to do, since its RISC instructions are pipeline ready.
Think about it, every x86 processor in the world sits there, continuously translating the same instructions, over and over, every second they are running. How inefficient! Someone might not care if they're running one processor in their PC, but someone designing a supercomputer that has thousands of processors in it will surely notice the difference in their energy bill, cooling requirements, etc.
Sign in to Reply
iniewski
12/7/2011 9:48 AM EST
I agree Patrick, this is a main message regardless of all these details which processors was fabricated at which process nodes and similar noise...Intel could design a processor for supercomputing that is not x86 compatible, why they are not doing that? market too small? Kris
Sign in to Reply
ogdenj
12/7/2011 10:44 AM EST
iniewski: They did it! "Intel Paragon" supercomputer was built using i860 which is not x86.
Sign in to Reply
iniewski
12/7/2011 10:59 AM EST
thank you @Ogdenj, that was a while back, what happened to it? Kris
Sign in to Reply
woohoo
12/7/2011 2:16 PM EST
let me tell you whats news.. "ARM supercomputer to be more faster than x86"
Sign in to Reply
luting
12/8/2011 3:51 PM EST
To be a serious player, i believe ARM needs to deliver its 64-bit core first. Then we will see who will be winner. But it is no doubt there is better chance for ARM to move up to grab market share from Intel than Intel moves down to grab share from ARM. Because Intel is fighting this war by itself, ARM has entire ARMY around it. This Amry almost includes entire semiconductor companies except Intel and even larger software and tools partner. If Intel wins, the only company benefit is Intel. If ARM wins, there is long list of Companies you could name, starting from Apple, Google, Qualcomm, Samsung, etc. Even those tradition PC/Server companies such as HP & Dell could benefit from it to have alternative choice for their product. I could not imagine how Intel could win this war.
I think Intel should seriously consider build ARM product as well. if ARM is failed, good news. If ARM is successful, Intel could get its share as well.
Sign in to Reply
MikeSmith2011
12/9/2011 5:53 PM EST
They already did. A company called AppliedMicro announced a working 64b ARM CPU designed for cloud computing called X-Gene. Not sure if they are targetting supercomputing but I don'
t see why not.
Sign in to Reply
Sanjib.Acharya
12/8/2011 10:03 PM EST
Thanks to all of you for providing many important inputs on this topic. I think, I see majority is voting for ARM and I see very much valid justifications behind the opinion.
Other than performance, the next thing comes to my mind is reliability. What is your opinion about ARM vs. Intel?
Sign in to Reply
MikeSmith2011
12/9/2011 5:56 PM EST
Reliability is a function of the implementation. So it would depend on the companies designing ARM cores and the RAS features they decide to put in.Intel Itanium e.g has a lot more reliability features compared to the x86 Xeons which again have more reliability features compared to the x86 corei5s used in the desktop parts.
Sign in to Reply
t.alex
12/25/2011 6:49 PM EST
He should be a marketing guy :)
Sign in to Reply