LONDON – A software simulation of a large-scale neural network distributed across 16,000
processor cores in Google's data centers has been used to investigate the
difference between learning from labeled data and self-taught learning. Researchers from Stanford University (Palo Alto, Calif.) and Google Inc. (Menlo Park, Calif.) trained models with more than 1 billion connections and found out that, amongst other
things, the network learned how to identify a cat after a week of
watching YouTube videos.
Google, best known for its search engine capability, said the advantage of self-taught neural networks is that they don't need deliberately labeled data to work with. Adding labels to data, for example tagging images that have cats in them, consumes energy and makes teaching networks expensive.
The research is expected to have applications outside of image recognition, including speech recognition and natural language modeling, Google said.
After a training period one neuron in the network had learned to respond strongly to cats. Source: Google.
"Our hypothesis was that it [the neural network] would learn to recognize common objects in those videos. Indeed, to our amusement, one of our artificial neurons learned to respond strongly to pictures of cats. Remember that this network had never been told what a cat was, nor was it given even a single image labeled as a cat. Instead, it discovered what a cat looked like by itself from only unlabeled YouTube stills," said Google Fellow Jeff Dean in a posting at Google's website.
In addition, using this relatively large-scale neural network, Google achieved a 70 percent relative improvement in the state-of-the-art accuracy on a standard image classification test by mixing the freely available unlabeled images posted on the internet with a limited set of labeled data.
Google researchers want to increase the size of the network further to see if exponentially improved performance comes with scale. Whereas the current network supported a network with a billion connections the human brain supports around 100 trillion connections, Dean said in his blog.
Google researchers are presenting a paper on the neural network learning at the International Conference on Machine Learning (ICML 2012) being held in Edinburgh, Scotland, June 26 to July 1.
Image recognition insights are often buried in the unreported details of detection thresholds, false positives and false negatives. First (regarding detection thresholds): how does the neuron's sensitivity compare with a human? Secondly (false positives): does the neuron ever fire on an image in which a human viewer cannot see the cat (and a cat is not believed to be present)? If so, are there any cues regarding what makes that image seem catlike? What mimics a cat and confuses the neuron? Finally (false negatives): does the neuron ever miss cats that seem to be within the normal detection sensitivity of the neuron? These instances may lead to some cues regarding the parameters that the neuron is depending upon that we don't use as heavily. These are the cats that are "camouflaged" from the neuron.
I was also thinking of Skynet, but feel we don't have to worry until the Google network can *herd* cats. If that happens, it's game over for us carbon-based units (to borrow a phrase from another sci-fi dynasty). ;-)
All kidding aside, this is an interesting development in cognitive computing and I expect we'll see many more to come from the likes of Google.
My understanding is that the network does now know the word "cat."
But if you show the network any image that YOU think is of a cat, one neuron, or pattern of neurons always fires in response. And it you show it an image of anything that YOU think is not of a cat, that neuron, or sub-network, does not respond.
Nonetheless it is learning by itself, and like neuromorphic system. And the experiment could be extended to cross matching the images with spoken words.
Interesting! Learning by itself, what a cat is!
It's like a baby learning and relating images with words. I suppose the neural system was able to "hear" the audio of the videos and relate "cat" to the images of cats? If no tags were used... how was the relation established?
Seems indeed we're in the cognitive computing time.
I think Google make the point that most people have made networks with 10s of millions of connections while they have created a network support a billion connections.
It may be that scale is important in neuromorphic systems and the good stuff only starts to happen when you get above 1 billion connections.
Neural Networks trained using a set of test data identifies a pattern! There is nothing radically new here. Just that Google could afford to pay for a large 16KCPU server to support the large number of nueral connections.
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.