SAN FRANCISCO--An ARM CPU is inherently more efficient than an x86 CPU and therefore best suited toward the high performance computing needs of the future, according to Nvidia Corp.
In a recent interview, Nvidia’s Sumit Gupta, director of Tesla marketing, said the only real advantage to x86 systems was that they could run operating systems like Microsoft Windows faster, but that when it came to needing maximum performance on minimum power, ARM was the future, and therefore a better option for supercomputing.
ARM architecture, explained Gupta, emerged out of the embedded space, where power limitations were prevalent and where less than a watt of power was considered a norm. All performance was therefore constrained from the conceptual phase of the chip’s design, forcing engineers to be especially creative about power efficiencies.
Intel and AMD’s x86 architecture, on the other hand, had been designed with PCs in mind, and came from a world in which machines were typically plugged in to wall sockets and faced no real power limitations.
“The number one consideration for x86 has always been to make operating systems like Windows run much faster and to be able to respond to unpredictable tasks, such as a mouse-click or a keyboard entry,” said Gupta, noting that the need for branch prediction and speculative execution was the reason x86 processors had such sizeable cache.
“It’s a terrific processor for everyday computing, not the right device as we go towards high performance computing,” he maintained.
Nvidia is already helping the Barcelona Supercomputing Center (BSC) to develop a hybrid supercomputer based on its Tegra ARM CPUs, accelerated by CUDA-supporting Tesla GPUs, with hopes of reaching exascale performance in a European project known as “Mont-Blanc”.
The hybrid will be the world's first ARM-based CPU/GPU supercomputing combination, and researchers at BSC have said they hope to achieve a short term goal of a two to five times improvement in energy efficiency compared with today's most efficient systems, with an ultimate goal of reaching exascale at 15 to 30 times less power.
Should the proof of concept work, Nvidia may well prove its point, but success seems a few years away at this point. In the meanwhile, Nvidia said it will continue working on a development board for the HPC community which the firm hopes will kickstart the software ecosystem around the ARM architecture for the supercomputers of the future.
I agree Patrick, this is a main message regardless of all these details which processors was fabricated at which process nodes and similar noise...Intel could design a processor for supercomputing that is not x86 compatible, why they are not doing that? market too small? Kris
Intel will always have the disadvantage of having to translate its vintage x86 CISC instructions into pipeline-able micro-ops. This is something ARM does not have to do, since its RISC instructions are pipeline ready.
Think about it, every x86 processor in the world sits there, continuously translating the same instructions, over and over, every second they are running. How inefficient! Someone might not care if they're running one processor in their PC, but someone designing a supercomputer that has thousands of processors in it will surely notice the difference in their energy bill, cooling requirements, etc.
I know the Cortex M3 and M4 MCUs are fabbed at 90nm and they still have excellent power savings. I can only imagine what power efficiency they would have at 22nm, even with leakage becoming a more dominate factor.
What are the ARM A9 and coming A15 being fabbed at, anyone?
gpus manage 1-2 Tf for about 300W, or ~3-6 Gf/W. dedicated HPC chips like in the K machine or BG/q are about the same (say 2-3 Gf/W). current x86 processors manage .5-1.5 Gf/W. (numbers are a bit fuzzy - chip vs system dissipation, etc)
the recent Calxeda ARM chips seem to be about 3 Gf/W, too. (assuming 1.5W/core, 1.2 GHz and 4 flops/cycle. might be half that, can't tell.)
in reality, the ISA has shifted with each generation. yes, adding to an ISA is messier than starting from scratch each time, but ARM is not pure and fresh, either. GPUs are probably the winner by this metric, since with, eg, cuda, apps are insulated by intermediate PTX code.
you're right: ARM is a fairly conventional ISA, though it's cleaner than x86. there must be some power savings in decode, but the processors have to eventually _do_ almost the same thing. (this argument doesn't hold as well comparing to GPUs, since their programming model restructures the code significantly.)
where did you get that idea? supercomputers are traditionally about _balance_, which tends to run against extreme core counts.
in fact, the push for many, lower-powered cores is precisely motivated by power consideration, works _against_ unpredictable workloads.
Obviously Intel has to maintain x86 backwards compatibility which limits its ability to innovate going forward...so every non-x86 architecture has a chance to be better but that is not guaranteed...in case of ARM I believe that the market has spoken clearly, just check where ARM was 5 or 10 years ago...Kris