A well-attended session on embedded processors for deep learning (above) included only research efforts. Nevertheless, they showed an explosion of creative architectures to pack neural-network processing into constrained chips at the edge of the network.
ST Microelectronics was the only vendor presenting at the session. Speaker Giuseppe Desoli admitted that his chip was a prototype rushed to completion to make the deadlines for ISSCC. “We wanted to let people know our capabilities,” he said in a brief discussion after his talk.
The chip accelerates the AlexNet imaging algorithms in part because they are widely known. “We will use whatever algorithm the user wants,” he said.
Desoli and other presenters made it clear that interest is high in bringing neural-network processing to a wide range of embedded systems. Future variants of Amazon Alexa or Google Home products will compete in the accuracy of their predictions and understanding of spoken commands.
“We believe that [neural nets] can become a key component of intelligent IoT networks, enabling true-edge computing, propagating [to the cloud] semantic information back from sensors,” he said in his talk. “The catch is the growing complexity of these algorithms” and the massive data streams they generate.
For example, AlexNet required a giga-operations/second to handle 60 million parameters in seven layers in 2012. In 2015, it jumped to 150 million parameters in 152 layers requiring 10 to 20 GOPs. While the algorithms are getting increasingly complex, the market demands devices that are, ideally, battery-operated.
The problem boils down to SoCs packed with arrays of smart DSPs and custom accelerators. Much of the art is in how engineers prune and map neural networks on to them.
ST described a reconfigurable accelerator framework using eight convolution accelerators and eight DSPs on a 64-bit crossbar switch with 16 DMA engines. In the key metric of energy efficiency, it delivers 2.9 TOPs/W.
Next page: Deep learning packs in the MACs