SAN JOSE, Calif. — Deep neural networks are like a tsunami on the distant horizon.
Given their still-evolving algorithms and applications, it’s unclear what changes deep neural nets (DNNs) ultimately will bring. But their successes thus far in translating text and recognizing images and speech make it clear they will reshape computer design, and the changes are coming at a time of equally profound disruptions in how semiconductors are designed and manufactured.
The first merchant chips tailored for training DNNs will ship this year. As it can take weeks or months to train a new neural-net model, the chips likely will be some of the largest, and thus most expensive, chunks of commercial silicon made to date.
The industry this year may see a microprocessor ship from startup Graphcore that uses no DRAM and one from rival Cerebras Systems that pioneers wafer-level integration. The hefty 2.5-D Nervana chip acquired by Intel is already sampling, and a dozen other processors are in the works. Meanwhile, chip companies from ARM to Western Digital are working on cores to accelerate the inference part of deep neural nets.
“I think  will be a coming-out party. We are just starting to see a bunch of ideas being evaluated from lots of companies,” said David Patterson, professor emeritus of the University of California, Berkeley.
The trend is so significant that Patterson and co-author John Hennessey devoted a new chapter to it in the latest version of their seminal text on computing, which published last month. The authors provide deep insights into in-house designs such as Google’s TensorFlow Processor (TPU), to which Patterson contributed, as well as Microsoft’s Catapult FPGA and inference blocks in the latest Apple and Google smartphone chips.
“This is a renaissance of computer architecture and packaging. We will see much more interesting computers in the next year than we did in the past decade,” Patterson said.
The rise of deep neural nets brought venture capital money back to semiconductors over the last few years. EE Times’ latest Silicon 60 lists seven startups working on some form of neural-networking chips, including two lesser-known names: Cambricon Technologies (Beijing) and Mythic Inc. (Austin, Texas).
“We’re seeing an explosion of new startups with new architectures. I’m tracking 15 to 20 myself ... We haven’t had 15 silicon companies [emerge] in one segment for 10 to 15 years,” said serial entrepreneur Chris Rowen, who left Cadence Design Systems to form Cognite Ventures, a company focused on neural-networking software.
“Nvidia will be tough to compete with for training in high-end servers because of its strong software position, and you’d be crazy to go after the cellphone because you have to be good at so many things there, but there may be opportunities at the high and low ends” of the smartphone market, Rowen said.
Nvidia did a great job with its latest GPU, Volta, tweaking it to do speed training of DNNs, said Linley Gwennap, principal of market watcher The Linley Group. “But I certainly don’t think it’s the best possible design,” Gwennap said.
Graphcore (Bristol, U.K.) and Cerebras (Los Altos, Calif.) are the top two startups to watch in training chips because they have raised the most money and seem to have the best teams, Gwennap said. Startup Groq, founded by former Google chip designers, claims it will have an inference chip in 2018 that beats rivals by a factor of four in both total operations and inferences per second.
Intel’s Nervana is a large linear algebra accelerator on a silicon interposer next to four 8-Gbyte HBM2 memory stacks. Click to enlarge. Source: Hennessy and Patterson, “Computer Architecture: A Quantitative Approach”
Intel’s Nervana, called Lake Crest (above), is one of the most-watched custom designs. It executes 16-bit matrix operations with data sharing a single 5-bit exponent provided in the instruction set.
As in Nvidia’s Volta, the Lake Crest logic sits on a TSMC CoWoS (chip-on-wafer-on-substrate) interposer next to four HBM2 high-bandwidth memory stacks. The chips are designed to work as a mesh, delivering five to 10 times the performance of Volta.
While Microsoft last year drew praise for its use of FPGAs for DNNs, Patterson remains skeptical of that approach. “You pay a lot for [FPGAs’] flexibility; the programming is really hard,” he said.
DSPs also will play a role, Gwennap noted in an analysis late last year . Cadence, Ceva, and Synopsys are all providing DSP cores geared for neural nets, he said.
Next page: Accelerators lack common benchmarks