MOUNTAIN VIEW, Calif.--While Intel Corp. and Nvidia Corp. engage in a war of words over which architecture best suits high performance computing –graphics cards or many integrated cores (MIC)—rival chip designer Advanced Micro Devices (AMD) is offering up its own Fusion architecture for consideration, and some vendors are already taking the bait.
Penguin Computing Inc., for instance, recently installed the world’s first HPC Accelerated Processing Unit (APU) cluster based on AMD’s architecture at Sandia National Labs in Albuquerque, New Mexico.
The experimental system, known as Altus 2A00, sports 104 nodes interconnected using QDR Infiniband fabric, purportedly able to reach a peak performance of 59.6 TFLOPs.
APUs are something of a hybrid between CPUs and GPUs, combining multi-core x86 processing, memory controllers, a massively parallel and a PCI-E interface, all on one piece of silicon.
“APUs are interesting for a different class of problems; problems that accelerate well with a GPU,” explained Penguin CTO Phil Pokorny, when he spoke to EE Times recently at SC11 in Seattle, Washington.
With the CPU and GPU fused together into one core sharing a common memory controller, said Pokorny, APUs mean applications need not be limited to the two, four or six gigabytes of memory on a GPU card, instead being able to access all host memory. On the Sandia system, this means applications have access to some 16GBs of RAM.
APUs boast some 400 parallel processing cores which can be programmed using the OpenCL framework, which is simpler than some other programming models, said Pokorny, adding that owing to the APU’s fused chip structure, a lot of bottlenecks and duplication of data was likewise avoided.
While an interesting option, however, Penguin Computing is not staking all of its hopes on APUs, despite its status as an “elite partner” in the AMD Fusion partner program.
The firm, which started out in the market for Linux servers a decade ago, has also delivered HPC clusters based on AMD’s new Interlagos chips to market.
The new Opteron 6200 and 4200 series –which can boast up to 16 cores per processor-- are in Penguin’s refreshed Altus server line, as well as in a couple of clusters delivered to both Georgia Tech and the at University of Delaware.
Delaware’s cluster is made up of 200 compute servers, interconnected through QDR InfiniBand with a potential peak performance of 49.3 TFLOPs and aggregate memory capacity of 13.5TB. The integrated quad channel DDR3 memory controllers ensure clock speeds of up to 1866MHz, more than capable of supporting the high core density.
Pokorny explained that AMD had also added FMA4 to the Bulldozer architecture, on which the new Opterons are made, meaning that the chips can perform twice as many floating point operations per clock cycle.
You can hear more of Pokorny’s interview with EE Times in the video.