HUNTSVILLE, Ala.—Traditional computer architectures are really great when it comes to tasks that involve a lot of raw number crunching, like modelling the evolution of a galaxy involving hundreds of billions of stars, for example. However, they typically perform poorly at many tasks that humans excel at, such as identifying one of man's best friends.
The human brain is based on networks of neurons that perform all of our cognitive processing, including audio and visual processing. These networks develop over time as data is collected and analyzed. For many years, researchers have strived to mimic the way the brain works by creating artificial neural networks.
We start with a training phase in which the network is exposed to a large amount of data (potentially hundreds of thousands of samples). This training phase is used to establish the weight values associated with the paths and nodes forming the network. These weight values constrain how input data is related to output data. Once "trained," the neural network can be used to analyze, classify, and identify new data.
In the not-so-distant past, we used to be happy to be able to get a network with just a few layers to work. More recently, it's become possible to train "deeper" networks containing more layers, which has led to the concept of "deep learning" neural networks. In reality, there is no formal definition as to what constitutes "deep," but it's generally understood to encompass at least four layers, and some networks may boast 30 or more layers.
Deep learning neural networks could prove beneficial for a wide variety of tasks, including object recognition, vision analytics, advanced driver assistance systems (ADAS), all the way up to artificial intelligence (AI). Consider, for example, the benefits of one's automobile being able to recognize and respond to warning signs, even if those signs were obscured in some way.
Convolutional neural networks (CNNs) are currently the most popular deep learning neural network method because they offer the best recognition quality versus alternative recognition algorithms; also, they are re-trainable without requiring changes to the code. The most popular open source deep learning software framework used to build, train, and activate neural networks is known as Caffe. Developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors, the Caffe deep learning framework has been created with expression, speed, and modularity in mind (no pun intended).
Until now, neural network researchers have been limited by lack of computing horsepower, power constraints, and algorithmic quality. If we wish to deploy deep learning neural network-based embedded vision systems in portable products, for example, we have to be talking about consuming milliwatts of power as opposed to the tens or hundreds of watts consumed by conventional implementations.
In order to address these issues, CEVA has just introduced its CEVA Deep Neural Network (CDNN), which harnesses the power of the CEVA-XM4 imaging and vision DSP core to provide a low-power, low-memory-bandwidth, deep neural network solution that accelerated deep learning application deployment.
Consider the following illustration of a CNN usage flow based on Cafee and CDNN:
The network structure is defined and trained offline using Caffe. The CEVA Network Generator is then used to automatically convert the original network structure and weights to a slim, customized, real-time network model. Once the customized embedded-ready network is generated, it runs on the CEVA-XM4 imaging and vision DSP using fully optimized Convolutional Neural Network (CNN) layers, software libraries, and APIs.
As an example, consider a Caffe open source implementation of the AlexNet CNN. Running on the CEVA-XM4, the AlexNet implementation achieved 3X faster programming while consuming 1/30th the power of a GPU-based system; it also required 1/15th the memory of a typical implementation.
Of particular interest is the fact that the fixed-point network created by the CEVA Network Generator running on the CEVA-XM4 exhibited less than 1% degradation in accuracy compared to the original floating-point implementation of the network running on a PC.
Caffe running on the NVIDIA Jetson TK1 mobile board provides inference at 35ms per image while consuming 10 watts of power (left) versus Caffe running on the CEVA-XM4 providing inference at 10ms per image while consuming <30 milliwatts of power (right).
One question you might ask regarding the image above is: "If Caffe running on the CEVA-XM4 development board consumes <30 milliwats of power, why do we need a cooling fan?" That's a good question and I'm glad you asked it (it's the same question I posed to the guys and gals at CEVA). The answer is that the <30 milliwatts value refers to a CEVA-XM4 IP core implemented in an SoC; however, the image above shows an FPGA-based CEVA-XM4 development board, in which the XM4 IP core is implemented in a power-guzzling FPGA.
Phi Algorithm Solutions, a member of CEVA’s CEVAnet partner program, has used CDNN to implement a CNN-based Universal Object Detector algorithm for the CEVA-XM4 DSP. This is now available for application developers and OEMs to run a variety of applications, including pedestrian detection and face detection for security, ADAS, and other embedded devices based around low-power camera-enabled systems.
As illustrated above, the Phi Algorithm Solutions real-time pedestrian detection application utilizing CDNN and optimized for the CEVA-XM4 DSP consumes <30mW while processing 1080p video at 30 frames-per-second.
All of this is taking us one step closer to systems that boast embedded speech and embedded vision and true artificial intelligence. It's going to be very interesting to observe ongoing developments in the next few years. What's your take on this? Are you excited or are you scared (robot apocalypse, anyone)?
— Max Maxfield, Editor of All Things Fun & Interesting