News
EEtimes
News the global electronics community can trust
eetimes.com
power electronics news
The trusted news source for power-conscious design engineers
powerelectronicsnews.com
ebn
Supply chain news for the electronics industry
ebnonline.com
elektroda
The can't-miss forum engineers and hobbyists
elektroda.pl
Products
Electronics Products
Product news that empowers design decisions
electronicproducts.com
Datasheets.com
Design engineer' search engine for electronic components
datasheets.com
eem
The electronic components resource for engineers and purchasers
eem.com
Design
embedded.com
The design site for hardware software, and firmware engineers
embedded.com
Elector Schematics
Where makers and hobbyists share projects
electroschematics.com
edn Network
The design site for electronics engineers and engineering managers
edn.com
electronic tutorials
The learning center for future and novice engineers
electronics-tutorials.ws
TechOnline
The educational resource for the global engineering community
techonline.com
Tools
eeweb.com
Where electronics engineers discover the latest toolsThe design site for hardware software, and firmware engineers
eeweb.com
Part Sim
Circuit simulation made easy
partsim.com
schematics.com
Brings you all the tools to tackle projects big and small - combining real-world components with online collaboration
schematics.com
PCB Web
Hardware design made easy
pcbweb.com
schematics.io
A free online environment where users can create, edit, and share electrical schematics, or convert between popular file formats like Eagle, Altium, and OrCAD.
schematics.io
Product Advisor
Find the IoT board you’ve been searching for using this interactive solution space to help you visualize the product selection process and showcase important trade-off decisions.
transim.com/iot
Transim Engage
Transform your product pages with embeddable schematic, simulation, and 3D content modules while providing interactive user experiences for your customers.
transim.com/Products/Engage
About
AspenCore
A worldwide innovation hub servicing component manufacturers and distributors with unique marketing solutions
aspencore.com
Silicon Expert
SiliconExpert provides engineers with the data and insight they need to remove risk from the supply chain.
siliconexpert.com
Transim
Transim powers many of the tools engineers use every day on manufacturers' websites and can develop solutions for any company.
transim.com

Google and Nvidia Post New AI Benchmarks

By Karl Freund  07.10.2019 0

Over forty companies and eight research institutions comprising the nascent artificial intelligence (AI) industry have defined a set of standardized benchmarks called mlperf to enable comparisons of the various chips used to accelerate machine learning (ML) training and inference.In the second slate of training results (V 0.6) released today, both Nvidia and Google have demonstrated their abilities to reduce the compute time needed to train the underlying deep neural networks used in common AI applications from days to hours.

However, the cost of delivering these impressive results remains mind-boggling: note that the Nvidia DGX2h SuperPod used to perform these training jobs has an estimated retail price of some $38 million. Consequently, Google seeks to exploit their advantage as the only major public cloud provider to deliver AI supercomputing as a service to researchers and AI developers, all using their in-house developed Tensor Processing Units (TPUs) as their alternative to Nvidia GPUs.

The new results are truly impressive. Both Nvidia and Google claim #1 performance spots in three of the six “Max Scale” benchmarks. Nvidia was able to reduce their run-times dramatically (up to 80%) using the identical V100 TensorCore accelerator in the DGX2h building block. Many silicon startups are now probably explaining to their investors why their anticipated performance advantage over Nvidia has suddenly diminished, all due to Nvidia’s software prowess and ecosystem.

The first question, of course, is “where is everyone else?” While there are over 40 companies around the world developing AI-specific accelerators, most are developing chips for “inference,” not model training, where Nvidia enjoys a massive share of the multi-billion-dollar market. For these companies, the mlperf organization plans to release results in early September, just prior to the second AI HW Summit event in Silicon Valley. Even for companies building silicon for training, the staggering costs of competing in this marathon will preclude most if not all startups from participation. Intel should be in the mix, however, once they finish development of their highly anticipated Nervana NNP-T later this year.

So, who “won” and does it matter? Since the companies ran the benchmarks on a massive configuration that maximizes the results with the shortest training time, being #1 may mean that the team was able to gang over a thousand accelerators to train the network, a herculean software endeavor. Since both companies sell 16-chip configurations, and provided those results to mlperf, I have also provided that as a figure of normalized performance.

Acting AG Whitaker

Figure 1: mlperf 0.6 results for NVIDIA and Google, along with normalized results
to provide relative 16-accelerator performance. (Source: Moor Insights & Strategy)

I find it interesting that Nvidia’s best absolute performance is on the more complex neural network models (reinforcement learning and heavy-weight object detection with Mask R-CNN), perhaps showing that their hardware programmability and flexibility helps them keep pace with the development of newer, more complex and deeper models. I would also note that Google has wisely decided to cast a larger net to capture TPU users, working now to support the popular PyTorch AI framework in addition to Google’s TensorFlow tool set. This will remove one of the two largest barriers to adoption, the other being the exclusivity of TPU in the Google Compute Platform (GCP).

In the end, the answer to the “does it matter” question may be simply to look at the deployment model. Google TPU is a force to be reckoned with in public cloud hosted training in GCP, while Nvidia continues to provide excellent performance for in-house infrastructure and non-GCP public cloud services where their flexibility helps cloud providers amortize their cost over a very wide range of workloads (>500). I would say that Google TPU continues to improve with the now-beta TPU V3 POD, and Nvidia remains able to hold their broad leadership position. 

Karl Freund, senior analyst, machine learning and HPC, Moor Insights & Strategy

Disclosure: Moor Insights & Strategy, like all research and analyst firms, provides or has provided research, analysis, advising and/or consulting to many high-tech companies in the industry mentioned in this article, including Microsoft, Nvidia, NovuMind, Intel, Wave, Xilinx, and many others.

0 comments
Post Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.