Since the 1800s, people have predicted a revolution in machine intelligence. In that time we have gone from fear of robots taking our jobs to concern about the cloud processing data from our devices using deep-learning algorithms. For the industry, the key technical issues are how we can overcome challenges with latency, bandwidth, and compute power.
The efforts of Google and Facebook in deep neural network (DNN) are well known. More recently, Dr. Ren Wu of Baidu’s Cupertino-based Institute for Deep Learning showed in a presentation how using ARM-based server farms has allowed him to achieve massive improvements in speech-recognition and image-recognition tasks. His approach reduced the error rate in speech and optical character recognition by 25-30% and achieved 94% correctness for face-detection.
Baidu has deployed five DNN applications, with more in the pipeline. The scale of its training data -- running for weeks or months at a time -- is truly awe-inspiring. It uses hundreds of millions of images, OCR and Click-Through Rate data, and tens of billions of speech samples. What’s more, these data sets are projected to grow 1,000% per annum over the near term. Interestingly, these DNNs trained in the Baidu cloud are now being deployed by Baidu in a mobile app leveraging mobile GPUs and OpenCL.
While the main DNN players are focused on server-based solutions with the beginnings of a mobile strategy, Max Versace, director of Boston University Neuromorphics Lab and CEO of Neurala takes a different approach. He asserts that today’s machines would require a nuclear power plant to deliver processing power equivalent to the human brain.
Indeed, power efficiency is a huge issue. Up to a million times more energy per operation is required in the cloud as opposed to processing locally in a device, according to Mark Horowitz of Stanford University. Looking even farther out, Eugenio Culurciello of Teradeep advocates a hardware approach to deep learning aimed at maximizing the power efficiency of DNN in mobile devices.
Another question for interactive and safety-critical services such as self-driving cars is latency. It is already a challenge for Web search, with Google issuing a call to action for the semiconductor industry at ISSCC in February 2014. On this front, it's interesting to note that Neurala’s next-generation NASA Mars rover experiences a software delay with a round-trip latency of 28 minutes. Latency is also an issue for the human brain, where distributed processing powers our reflexes without involving the frontal cortex.
Cloud-based services only work when we can scale to millions of users simultaneously, but how are we to stream video from mobile devices when bandwidth is currently a major limitation even for services like Netflix? Princeton University’s SignalGuru project shows how local video pre-processing cuts bandwidth requirements 1,000-fold in a cloud-based application.
As we have seen, a monolithic cloud model doesn’t really stack up from the perspective of power efficiency, latency, and bandwidth scalability. If this is true, we must ask how we can overcome the limitations of today’s AI model so we can enrich our lives and provide enduring value.
There is an opportunity for the industry to distribute computation and build systems that are able to “think locally and act globally” by preprocessing data within personal devices using low-power processors. In this model, data is processed as close as possible to the sensor, as our brains work. In this approach, only metadata rather than video is streamed to the cloud, resolving the power-efficiency, latency, and security issues and requiring radically less expensive infrastructure.
— David Moloney is Senior Vice President and Chief Technology Officer at Movidius, a vendor of vision processors.