SAN FRANCISCO--An ARM CPU is inherently more efficient than an x86 CPU and therefore best suited toward the high performance computing needs of the future, according to Nvidia Corp.
In a recent interview, Nvidia’s Sumit Gupta, director of Tesla marketing, said the only real advantage to x86 systems was that they could run operating systems like Microsoft Windows faster, but that when it came to needing maximum performance on minimum power, ARM was the future, and therefore a better option for supercomputing.
ARM architecture, explained Gupta, emerged out of the embedded space, where power limitations were prevalent and where less than a watt of power was considered a norm. All performance was therefore constrained from the conceptual phase of the chip’s design, forcing engineers to be especially creative about power efficiencies.
Intel and AMD’s x86 architecture, on the other hand, had been designed with PCs in mind, and came from a world in which machines were typically plugged in to wall sockets and faced no real power limitations.
“The number one consideration for x86 has always been to make operating systems like Windows run much faster and to be able to respond to unpredictable tasks, such as a mouse-click or a keyboard entry,” said Gupta, noting that the need for branch prediction and speculative execution was the reason x86 processors had such sizeable cache.
“It’s a terrific processor for everyday computing, not the right device as we go towards high performance computing,” he maintained.
Nvidia is already helping the Barcelona Supercomputing Center (BSC) to develop a hybrid supercomputer based on its Tegra ARM CPUs, accelerated by CUDA-supporting Tesla GPUs, with hopes of reaching exascale performance in a European project known as “Mont-Blanc”.
The hybrid will be the world's first ARM-based CPU/GPU supercomputing combination, and researchers at BSC have said they hope to achieve a short term goal of a two to five times improvement in energy efficiency compared with today's most efficient systems, with an ultimate goal of reaching exascale at 15 to 30 times less power.
Should the proof of concept work, Nvidia may well prove its point, but success seems a few years away at this point. In the meanwhile, Nvidia said it will continue working on a development board for the HPC community which the firm hopes will kickstart the software ecosystem around the ARM architecture for the supercomputers of the future.
“It’s a terrific processor for everyday computing, not the right device as we go towards high performance computing,”
His statement seems to redefine high performance computing to energy efficient computing.
High performance computing IS energy efficient computing. At the scale we're talking nowadays, the best way to allow supercomputers to be faster is by reducing their power consumption and heat dissipation. Those are the factors limiting you from throwing in more computing resources.
all cortex a9 arm processors have out of order execution and branch prediction already. cortex a15 will also be super scalar. cortex a8 is dual in-order instruction issue. most vendors also include SIMD units in there arm offerings. no much a intel processor has on these except 5-10x perf/watt
gpus manage 1-2 Tf for about 300W, or ~3-6 Gf/W. dedicated HPC chips like in the K machine or BG/q are about the same (say 2-3 Gf/W). current x86 processors manage .5-1.5 Gf/W. (numbers are a bit fuzzy - chip vs system dissipation, etc)
the recent Calxeda ARM chips seem to be about 3 Gf/W, too. (assuming 1.5W/core, 1.2 GHz and 4 flops/cycle. might be half that, can't tell.)
Very interesting. Since supercomputers are all about mega-multicores, it would seem that there is a tradeoff between designing in more energy efficient cores, vs perhaps fewer cores that are better able to manage unpredictable tasks.
where did you get that idea? supercomputers are traditionally about _balance_, which tends to run against extreme core counts.
in fact, the push for many, lower-powered cores is precisely motivated by power consideration, works _against_ unpredictable workloads.
Well, let's see. The supercomputer used by NASA to discover Earth-like planets has 50,000 cores. I'm assuming it helps if unpredictable tasks can be managed more easily, in this sort of architecture. That core count sounds pretty extreme to me, although I suppose "extreme" is a relative term.
If extreme core counts are not involved then you'd still expect there to be a tradeoff between fewer, higher performing cores, as opposed to more, lower performing, but also lower power consuming cores.
But in general, I'l seeing a lot of arm waving going on here, me included. No one is offering specifics about the difference in the ARM vs x86 architecture. So I'm speculating only based on the popular press reports and common sense.
sorry, I thought you meant core-counts-per-chip - that is, that HPC was pushing to more cores per node. sure, large clusters have lots of cores, since they have lots of nodes. it's not like this is optional: ambitious computing has necessitated for decades.
I'm siding with Intel on this one. They've been successful for over 40 years and they keep evolving and adapting. 22nm process technology is going to be a big winner with a lot less power and a much smaller chip size (cheaper). ARM will lose most of the power advantage they used to have and as technology continues to march towards 16nm, 10nm, etc, it'll no longer be a factor. It'll be about features, ease-of-use and performance
I am not sure I agree Mike...yes, the process technology has been always helping Intel, so will 22nm process...but architecturally ARM is superior...is it an open question who will prevail 2-3 years from now, right now Intel is increasing their market share and revenue growth is really impressive! Kris
Because of its PC-based business model, x86 processors are spellbound to binary compatibility. Intel have to design processors to be able to run binary code written for 2 or 3 generations before - not only just "able" but fast and efficiently, because PC users will evaluate new processor performance with older generation of benchmark code.
I believe intel can produce highly optimized high-performance processor, perhaps even better than ARM guys, but pressure from their mainstream PC market will not easily allow to do so.
in reality, the ISA has shifted with each generation. yes, adding to an ISA is messier than starting from scratch each time, but ARM is not pure and fresh, either. GPUs are probably the winner by this metric, since with, eg, cuda, apps are insulated by intermediate PTX code.
you're right: ARM is a fairly conventional ISA, though it's cleaner than x86. there must be some power savings in decode, but the processors have to eventually _do_ almost the same thing. (this argument doesn't hold as well comparing to GPUs, since their programming model restructures the code significantly.)
Intel has a better technology Steven, always one or two generation ahead. Intel has better marketing, much larger budget than any other processor makers. So why would ARM exists at all if it didn't have a better architecture? Kris
Obviously Intel has to maintain x86 backwards compatibility which limits its ability to innovate going forward...so every non-x86 architecture has a chance to be better but that is not guaranteed...in case of ARM I believe that the market has spoken clearly, just check where ARM was 5 or 10 years ago...Kris
Based on what criteria are you claiming that ARM architecture is superior? Intel and ARM make different tradeoffs when designing their processor cores. Usually Intel is more aggressive with performance, while ARM is more aggressive on power efficiency.
I recently looked at some x86-64 code and was shocked at the number of push and pop instructions. Intel still only have 4 general purpose registers. All 16 registers on ARM are general purpose though you'd be silly to use r13-r16 (stack, link, program counter). That means more stuff in registers less pushing and popping. Just one example of how ARM is a more efficient design.
I know the Cortex M3 and M4 MCUs are fabbed at 90nm and they still have excellent power savings. I can only imagine what power efficiency they would have at 22nm, even with leakage becoming a more dominate factor.
What are the ARM A9 and coming A15 being fabbed at, anyone?
Intel will always have the disadvantage of having to translate its vintage x86 CISC instructions into pipeline-able micro-ops. This is something ARM does not have to do, since its RISC instructions are pipeline ready.
Think about it, every x86 processor in the world sits there, continuously translating the same instructions, over and over, every second they are running. How inefficient! Someone might not care if they're running one processor in their PC, but someone designing a supercomputer that has thousands of processors in it will surely notice the difference in their energy bill, cooling requirements, etc.
I agree Patrick, this is a main message regardless of all these details which processors was fabricated at which process nodes and similar noise...Intel could design a processor for supercomputing that is not x86 compatible, why they are not doing that? market too small? Kris
To be a serious player, i believe ARM needs to deliver its 64-bit core first. Then we will see who will be winner. But it is no doubt there is better chance for ARM to move up to grab market share from Intel than Intel moves down to grab share from ARM. Because Intel is fighting this war by itself, ARM has entire ARMY around it. This Amry almost includes entire semiconductor companies except Intel and even larger software and tools partner. If Intel wins, the only company benefit is Intel. If ARM wins, there is long list of Companies you could name, starting from Apple, Google, Qualcomm, Samsung, etc. Even those tradition PC/Server companies such as HP & Dell could benefit from it to have alternative choice for their product. I could not imagine how Intel could win this war.
I think Intel should seriously consider build ARM product as well. if ARM is failed, good news. If ARM is successful, Intel could get its share as well.
Thanks to all of you for providing many important inputs on this topic. I think, I see majority is voting for ARM and I see very much valid justifications behind the opinion.
Other than performance, the next thing comes to my mind is reliability. What is your opinion about ARM vs. Intel?
Reliability is a function of the implementation. So it would depend on the companies designing ARM cores and the RAS features they decide to put in.Intel Itanium e.g has a lot more reliability features compared to the x86 Xeons which again have more reliability features compared to the x86 corei5s used in the desktop parts.