PORTLAND, Ore. -- First computers beat the best of us at chess, then poker, and finally Jeopardy. The next hurdle is image recognition — surely a computer can't do that as well as a human. Check that one off the list, too. Now Microsoft has programmed the first computer to beat the humans at image recognition.
The competition is fierce, with the ImageNet Large Scale Visual Recognition Challenge doing the judging for the 2015 championship on December 17. Between now and then expect to see a stream of papers claiming they have one-upped humans too. For instance, only 5 days after Microsoft announced it had beat the human benchmark of 5.1% errors with a 4.94% error grabbing neural network, Google announced it had one-upped Microsoft by 0.04%.
The top row is a representative of the categories that Microsoft's algorithm found in the database and the image columns below are examples that fit.
ImageNet, with hundreds of object categories and millions of example images, has been running the competition since 2010 with about 50 institutions competing, but this is the first year than a computer will take the crown from the best human score. All the contestants are using what today is called deep learning algorithms, which are all derived from various versions of artificial neural networks which mimic the way the human brain works to varying degrees. Most of the contestants freely provide papers describing their algorithm in great detail -- in the spirit of open source without providing the exact code -- explaining why their algorithm worked so well. Here Microsoft revealed it was using deep convolutional neural networks (CNNs) with 30 weight layers. Google revealed its batch normalization technique that keeps from saturating neurons during initialization.
"In previous work, the neural units were hand-designed and fixed during training. In contrast, we make the units smarter by allowing them to take a more flexible form," Jian Sun, principal researcher for the Visual Computing Group, Microsoft Research Asia told EE Times. "More importantly, the particular form of each unit is learned by end-to-end training. We observed that introducing smarter units can considerably improve the model."
When questioned further as to why their current neural network was able to take the crown as the first to beat the human experts, Sun responded by citing details of its Deep Learning algorithm, which usually initializes by training on 1.2 million training images, then verifies on 50,000 validation images, and finally applies what it learned to 100,000 test images in the main image database. Microsoft, however, took a slightly different tactic.
"A robust initialization method, as a part of training algorithm, was needed since training very deep neural networks is difficult. Previous work either resorts to pre-training or adding auxiliary training tasks. In our work, we derive a theoretically sound initialization method which allows us to freely exploit more powerful -- deeper and wider -- neural networks," Sun told EE Times.
Nvidia is a sponsor of the annual ImageNet Challenge, and supplies access to arrays of its graphic processing units (GPUs) to all contestants. Microsoft did use Nvidia GPUs, but bought and configured their own supercomputer using them to simulate parametric rectified linear neural units to become the "1st to beat a human" at image classification.
The teams results are already being applied to Microsoft's Bing image search and OneDrive. Sun's team consisted of Kaiming He, at Microsoft Research Asia’s Visual Computing Group, and two academic interns, Xiangyu Zhang of Xi’an Jiaotong University and Shaoqing Ren of the University of Science and Technology of China.
— R. Colin Johnson, Advanced Technology Editor, EE Times