Laying a foundation for machine learning
SAN JOSE, Calif. — Facebook is pouring a new software foundation for machine learning and hammering on top of it the early 2x4s of augmented and virtual reality. Its annual developer conference here showed a rich mix of creative and sometimes crude work toward its many ambitious aims.
Under the hood, Facebook and its cloud computing rivals are essentially global giant data farms where harvesting the good stuff depends on machine learning. Facebook’s engine is unique in part for how it aims to leverage both its global warehouses of servers, and its users’ smartphones.
Facebook’s new Caffe 2 framework takes a page -- and a key developer -- from rival Google’s Tensorflow. Caffe 2 is a significantly upgraded version of the machine-learning framework originally created at UC Berkeley by Yangqing Jia. After graduating, he spent two years at Google working on Tensorflow and other projects before Facebook hired him in February 2016 as engineering lead for its AI platform.
“We needed more flexibility, so we made [Caffe] more modular and friendly to different hardware back ends…mobile is a major interest,” said Jia in an interview with EE Times on the F8 show floor.
Smartphones will use Caffe 2 models to recognize and enhance objects in photos, creating AR effects similar to Pokemon Go and Snapchat filters. The so-called “style transfers” mark Facebook’s first strategic steps to draw consumers into AR.
On the show floor, Qualcomm showed neural-network software handling image recognition at 50 frames/second on the Adreno GPU cores in its high-end Snapdragon SoCs. That’s much faster than the 12 f/s the SoC’s Kryo CPU delivers as a default. This summer Qualcomm will release the software in a version that also uses its Hexagon DSP as an accelerator.
Facebook is clearly betting on such capabilities being widely available in the future. For today it will have to deal with a widely fragmented handset market that lacks such support.
Processor vendors will want to support all broadly used machine learning frameworks, but their efforts will take time. For example, ARM and Ceva, whose cores are widely used in smartphones, have so far expressed support for Tensorflow, not Caffe 2. Intel, Nvidia and Qualcomm were quick to say they will support Caffe 2. Facebook already uses Nvidia’s GPUs on it training servers.
Jia suggested the door open is for Caffe 2 inference accelerators on its servers, similar to the TPUs Google started deploying in 2015.
“We see [potential for] a lot of hardware optimizations…There are quite a few computation patterns stabilized enough for hardware to use, and more and more computation patents will stabilize,” he said.
However, all the frameworks are still evolving to support a wider range of neural networks with higher performance. “It’s a healthy competition, in general we’re all searching for a better solution for A.I.—it’s like the evolution of programming languages,” he said.
For its part, Facebook is replacing a mix of frameworks including Torch with Caffe 2. Facebook made the framework open source and will use it on all its machine learning jobs including computer vision and machine translation on servers. It also is working with rivals Amazon and Microsoft to make it easier for business users to tap into their AI services.
“It’s about setting up development environments easily, similar to using pre-installed software,” Jia said.
For its part, Google’s Tensorflow is being used by AirBnB, Dropbox, SAP, Twitter, Uber and Xiaomi.
Overall, Facebook is behind its Web rivals in machine learning, according to Richard Windsor, analyst at Edison Investment Research. “AI remains essential to Facebook’s long-term growth as it is sitting upon a mountain of data but still is not in a position to really make the most of the insights and automation that it can provide,” Windsor wrote in a research note that cited progress in image recognition where the company has made significant hires.
Qualcomm showed gains running Caffe 2 on its Adreno GPU. (Image EE Times)
“Facebook had to move beyond Torch, which they use internally for research, since Torch is based on the Lua language,” said Karl Freund, a senior analyst at Moor Insights and Strategy. “Most AI programmers use and prefer Python, which is native for many other frameworks, including Caffe,” he said.
“The largest cloud and Internet sites have developed and promoted their own AI frameworks to keep a competitive edge and develop a loyal ecosystem,” Freund said. “They cannot use someone else’s open source versions as these always lag the code used internally by as much as a year. Each company will have their specific area of focus, for example, Facebook focuses on imaging and Amazon focuses on natural language processing,” he added.
Indeed, Facebook AI specialist Joaquin Candela showed how its Mask R-CNN algorithms can now tightly detect and classify people and objects—even when they are moving in video or have parts blocked by other objects.
Candela claimed the enhancements in Caffe 2 provided 100x speedups in inference jobs run on smartphones. Its use in Facebook’s image and texting apps — Instagram and Messenger — “makes it the largest A.I. deployment ever,” he said in a keynote.
Next page: More eyeballs on virtual reality
Fresh eyeballs on virtual reality
Facebook’s Surround 360 x6 and x24 3D cameras (above) were the most interesting among the few hardware projects described at the F8 event. They were designed with imaging specialist Flir Systems to deliver six degrees of freedom to output images at 3x and 4x pixel overlap, respectively.
The ruggedized systems provide per-pixel depth information. Image stitching and rendering software on Facebook’s servers turn the cameras’ output into 3D 360-degree content. The Web giant aims to license the camera designs to companies who will make, sell and rent them by the end of the year.
Facebook rallied software companies including Adobe, Otoy, Foundry, Mettle, DXO, Here Be Dragons, Framestore, Magnopus and The Mill to support the camera. They will deliver by the end of the year a suite of tools to enable developers and production companies to create more detailed virtual environments.
The camera and related tools are key to spawn high quality, professional content that is sorely needed for VR. As one producer at the event noted, virtual reality is still waiting for its “House of Cards” moment, referring to the Netflix series that put Internet video on the map.
Separately, a new 360 Capture SDK aims to enlist users in creating their own images and video for virtual reality environments using traditional stitching and cube-mapping techniques. The resulting content also targets viewing on PCs and handsets. Separately, Facebook described a handful of resolution-enhancement techniques it developed.
Next page: Welcome to Cartoonland
Welcome to Cartoonland
A new VR social networking site was less than compelling. (Images: Facebook)
Facebook was quick to jump on the VR bandwagon, buying headset maker Oculus for $2 billion in 2014 and striking a collaboration with Samsung on its smartphone-based gear VR. These days the industry is recovering from the hangover after the hype.
The lines were still somewhat long to try out headsets at multiple Oculus and Gear VR stations on the show floor. But in some cases the visuals were crude.
Facebook Spaces (above), the first home for VR on the social network, had all the attraction of Microsoft Bob, the Windows virtual world that flopped more than 20 years ago. Demos of Oculus Rooms (below) on the Gear VR didn’t look significantly more compelling.
Executives and technologists from Facebook founder Mark Zuckerberg on down repeated the message that the latest environments made clear with a quick look—good AR and VR may be a decade away.
One producer noted many requests for proposals and excitement last year around 360-degree videos have died down. More people are taking a wait-and-see attitude, hoping someone else will create killer content that points the way forward.
With a Gear VR headset, Oculus Rooms was only marginally more inviting.
Another producer pointed to many flowers blooming in China, include the operator of a theatre chain that has produced its own proprietary headset and SDK said to be making money. I could not find it, but did see this review of five low-cost VR headsets in China.
Amid the gloom, more eyes are tuning back to the PC and the WebVR format Oculus is now promoting in its VR-first browser, released last month. It aims to get away from relatively arcane game engines and a small pool of headset owners, opening a door to include a wider set of developers and users on desktops and handsets.
Google, Microsoft, Mozilla and Samsung have joined Facebook to drive WebVR forward. An Oculus engineer even described emerging techniques and standards to transform existing Web pages into VR experiences, echoing failed efforts to create tools that would automatically bring old movies into 3D version.
Next page: Starting the long trek to AR
Starting the long trek to AR
Last year Facebook enabled all its applications for cameras. At this year’s event it launched AR Studio (above), a point-and-click tool to build the kinds of augmented-reality experiences that Pokemon Go and Snapchat filters have made popular (below).
The net effect is Facebook now operates “the largest AI camera ever,” quipped machine-learning specialist Joaquin Candela, who noted its abilities to recognize faces and other objects.
“The camera needs to be more central than the text box,” said Zuckerberg in a keynote. “Eventually we want glasses or contact lenses that show digital objects overlaid on the real world,” he said, noting a digital TV might someday be a $1 app rather than a $5,000 chunk of hardware.
None of this will come soon, but it will come, said Michael Abrash, chief scientist at Oculus Research and a veteran games and graphics programmer.
Stylish, see-through “AR glasses may take 5-10 years or longer to arrive, but they will become a vital part of our lives,” he said adding that the set of technologies to build them “does not exist yet.”
For Facebook, AR is at least in part the new eye candy. “The aim here is to get users to spend more time within the Facebook ecosystem thereby increasing potential for monetisation,” wrote Richard Windsor of Edison Investment Research.
Next page: 360 Cameras for the masses
Facebook adds face recognition to Snapchat's ability to add styles.
360 Cameras for the masses
Facebook displayed a half dozen consumer 360-degree cameras on the show floor. To spark more grassroots AR and VR content, it gave its 4,000+ attendees one of the Giroptic iO 360 devices (above) that plug into a smartphone via USB.
The anthropomorphic Nokia Ozo (below) was among the more interesting industrial designs among the others it showed. It claims it can deliver 360-degree live video at 30 frames/second along with 3D audio.
Next page: Last year’s industrial-quality 360 cameras
Last year’s industrial-quality 360 cameras
Another display showed three commercial cameras that responded to Facebook’s release of specs last year for its initial Surround 360 camera. The 80W Endeavor 360 (above) uses a dozen 10Mpixel AR1011 CMOS imager from On Semiconductor. The 40W Z Cam V1 Pro (below) uses nine Sony imagers and an Hi3519 V101 image processor from HiSilicon.
Next page: A crazy salad of miscellaneous projects
A crazy salad of miscellaneous projects
Among a crazy salad of other talks and announcements, Geekpark founder Jack Peng Zheng (above) told attendees since 2012 China has raised 75 unicorns compared to only 12 in the U.S.
The billion-dollar startups are feeding on the country’s 940.75 million smartphone users and finding “growth in the gaps” of relatively immature traditional industries, he said noting Meituan, China’s version of Groupon as one example. Mobile payments are enabling a variety of opportunities including bike-sharing apps that charge on average just eight cents a ride.
Closer to home, Facebook announced it is testing a 60 MHz Web access network in San Jose as part of its Terragraph project. It demoed fast auto-routing protocols that shifted a video stream between base stations after an outage without dropping frames.
Going beyond the stratosphere, Regina Dugan, described bio-tech projects in the Building 8 research group she helped form about a year ago. Dugan described a brain interface for typing 100 words per minute using non-invasive optical sensors measuring brain activity hundreds of times per second. Some sixty people are working on the project including researchers at Johns Hopkins and Berkeley.
A former DARPA director and Google researcher behind its Project Tango and Project Ara initiatives, Dugan also described a project following the footsteps of Braille to let people “hear with their skin.”
— Rick Merritt, Silicon Valley Bureau Chief, EE Times