Next year Qualcomm will release its suite of software tools that work with an FPGA emulator for developers to use when creating applictions for its NPU. Regarding automotive, Purdue University professor Eugenio Culurciello has already shown that recognition of roadside scenes can result in realtime classification into pedestrians, vehicles, buildings, etc., but I suspect it will take a year or two for developers to start making good use of this type of information in collision avoidance and similar automotive applications.
When I first saw this I thought, "Rats! Why can't all of the fantastic and very similar work that Eugenio Culurciello has been doing for years under Office of Naval Research (ONR) funding get this kind of press...?" This also shows why one should read figure captions. From said caption on the familiar-looking first figure I finally realized that this _is_ the next phase for Dr Culurciello's amazing chips! Since shutting down the government seems to be the theme-du-jour, it's worth pointing out just how huge a role federal funding plays in giving folks like Dr Culurciello a chance to move the ball on some wild new idea to the point where a large company like Qualcomm can see the potential and catch the pass. That's the kind of teaming where everyone benefits.
More hype than substance I'd say at least when it comes to the claims about a neural network processing unit (NPU) in hardware.
Looking at the braincorp jobs page http://braincorporation.com/index.php/category/opportunities/ these guys are hosted inside Qualcomm and are focused on building robots, robotics algorithms and tools.
They are hosted inside Qualcomm and appear to be leveraging Qualcomm's GPUs using OpenGL and OpenCL, as well as C/C++, Python and Matlab to to the NN processing rather than some kind of specialised NN processor.
I'd say most of the NN work is being done in Matlab with some kind of back-end code generation of C/C++ code or potentially OpenGL/OpenCL ES code, the smart approach would be to have libraries for the GPU which are optimised at low level and be able to call identical libraries within Matlab or Python for rapid development rather than designing an esoteric compiler.
I'd guess this may lead to adding NPU support in the ISA of future GPUs at some point if they really need it