LAKE WALES, Fla. — The latest Top500 list of the world’s fastest supercomputers turns the spotlight on China, which overtook the United States in the total number of ranked systems and which scored the top two fastest installations on the list. Announcements from IBM, Intel, and Advanced Micro Devices, however, position the U.S. industry for a comeback. Rather than target systems that test well on the Top500’s distributed-memory version of the Linpack benchmarks (High Performance Linpack), the companies aim to render those measurements irrelevant on their way to beating China to exascale computing.
China captured not only first and second place in the ranking of the fastest installed systems, but also won the majority share of ranked installations and took the aggregate performance lead, according to the November 2017 Top500 list, which was the 50th one to be published since the ranking debuted in June 1993. According to the Top500 organization, “There is no system from the USA under the Top3. #1 and #2 are installed in China ... the USA decreased to a new record low of 143 [installed Top500-ranked systems] from 169 six months ago. The number of systems installed in China increased to a new record high of 202, compared to 160 on the last list. China now clearly shows a substantially larger number of installations than the USA. China now is also pulling ahead of the USA in the performance category, with China holding 35.4% of the overall installed performance, while the USA is second, with 29.6%.”
“The high-performance computing landscape is evolving at a furious pace that some are describing as an important inflection point,” Dave Turek, IBM’s vice president for high-performance computing (HPC) and OpenPOWER, wrote in a recent blog. “Realizing that these demands could only be addressed by an open ecosystem, IBM partnered with industry leaders Google, Mellanox, Nvidia, and others to form the OpenPOWER Foundation, dedicated to stewarding the Power CPU architecture into the next generation.”
IBM’s Power9 professor will have up to 24 cores and up to 8 billion transistors, and will use 14-nm FinFETs with 120 Mbytes of shared L3 cache, eight-way simultaneous multithreading, and 230 Gbyte/s bandwidth to memory.
IBM’s silicon contribution will be its Power9 processor (see photo), housing up to 24 cores with up to 8 billion FinFET transistors cast in 14-nanometer CMOS, 120 megabytes of shared level-three cache, eight-way simultaneous multithreading, and 230 gigabytes/second of bandwidth to memory. Its architecture, to be showcased at Oak Ridge and Lawrence Livermore National Labs, will pack thousands of Nvidia Volta graphic-processing units (GPUs) aimed at boosting overall performance beyond that of China’s home-brewed Sunway CPUs.
IBM is banking mostly on its supercomputer data-centric architecture, which spreads out the processing power by embedding the processors at the locations where the data resides. This approach, according to Turek, yields a speedup of 5 to 10 times for the hardest applications: analytics; modeling; visualization; simulation; and artificial intelligence (AI), especially deep learning.
To address the specific architectural needs of AI, IBM has redesigned the data flow of its new Power9 processor to dovetail with massive numbers of GPUs and Nervana coprocessors. By scaling TensorFlow and Caffe across 256 Nvidia Tesla GPUs, IBM has been able to reduce deep learning times from 16 days to seven hours. The company aims to balloon this strategy to as many as 100 times more GPUs spread across 50,000 nodes by 2021, thereby achieving exascale computing (a billion billion calculations per second) before China does.
Intel’s specialized AI accelerator, the Nervana coprocessor (shown), plus on-chip FPGAs and its scalable system architecture, aims to boost performance where it counts — in analytics, AI, and deep learning.
“Power9 is loaded with industry-leading new technologies designed for AI to thrive,” IBM Fellow Brad McCredie, vice president of cognitive systems development, wrote in his blog. “With Power9, we’re moving to a new, off-chip era, with advanced accelerators like GPUs and FPGAs [field-programmable gate arrays] driving modern workloads, including AI.”
NEXT PAGE: New Commercial Platform