SAN JOSE, Calif. — Facebook will make open source a GPU server geared for machine learning. Big Sur packs eight Nvidia Tesla M40 graphics accelerators, each drawing up to 300 watts, and is the first system to use the high-end cards targeted at training deep neural networks.
The work is one of many efforts to apply FPGAs and GPUs to accelerate big data center jobs, increasingly using deep neural networks. More than a year ago rivals Baidu and Microsoft said they were rolling out FPGAs for a variety of data center applications including search, claiming GPUs have greater performance but at much higher power consumption and cost.
In February, rivals Microsoft and Google announced breakthroughs in image recognition using deep neural networks. Big Sur marks Facebook’s first foray beyond standard server, storage and switch designs. In November, Facebook announced a 100 Gbit/second switch.
Details of Big Sur won’t be available until an unspecified date when the design is released to the Open Compute Project, originally launched by Facebook. However, the Web giant did say the server uses the project’s Open Rack specification. In addition, it has “flexibility to configure between multiple PCI-e topologies.”
Facebook’s artificial intelligence research team is only working with Nvidia for now. Big Sur “was built with the Nvidia Tesla M40 in mind, but is qualified to support a wide range of PCI-e cards,” said a Facebook representative.
Big Sur packs eight Nvidia Tesla M40 accelerators using an OpenCL interface, but is qualified to handle other PCI Express cards. (Image: Facebook)
Nvidia was chosen for a variety of reasons, including the “fact that they have hardware agnostic APIs like Open CL,” the Facebook representative said.
Facebook is currently in the final phase of testing Big Sur with plans to use it in production networks next year.
The Web giant has “developed software that can read stories, answer questions about scenes, play games and even learn unspecified tasks through observing some examples. But we realized that truly tackling these problems at scale would require us to design our own systems,” engineers said in a blog posted today.
Facebook would not say how much it has invested in researching the field of machine learning or building the GPU server effort. However it did say the group “is more than tripling its investment in GPU hardware as we focus even more on research and enable other teams across the company to use neural networks in our products and services.”
Facebook is currently using off-the-shelf GPU servers to handle machine learning tasks, but they require special cooling, are relatively expensive and are difficult to maintain, the blog said. Big Sur can be air-cooled like other systems in Facebook’s data centers
As with other servers optimized for big data center operators, Facebook streamlined existing designs to save cost and ease maintenance. “We've removed the components that don't get used very much, and components that fail relatively frequently — such as hard drives and DIMMs — can now be removed and replaced in a few seconds,” the blog said.
For example, the Big Sur motherboard can be removed in a minute, compared to an hour’s work for existing GPU servers. The CPU heat sinks are the only replaceable items in the design that require a screwdriver, it added.
— Rick Merritt, Silicon Valley Bureau Chief, EE Times