Worldwide, Facebook stores on the order of 240 billion images, a number that is augmented daily by roughly 350 million new photos. At January’s Open Compute Project Summit, Jay Parikh, Facebook’s VP of infrastructure, called on the memory industry to develop a hybrid technology that combines the economy of spinning-disk memory and the access speed of flash memory. We sat down with him to find out a bit more about what he has in mind.
Kristin Lewotsky, editor: Right now what does your memory infrastructure consist of?
Jay Parikh: Most of the infrastructure that stores our photo system today is spinning disks. We have a lot of very sophisticated software that essentially manages the actual objects on the disks and the distributed system because we have to maintain copies of the objects in case we lose a disk or a data center. The second part of this, though, is that we also then use memory to serve the hot photos quickly to end users. There are many layers of caching that involves using either RAM or, for us, both RAM and like a flash device or flash devices to provide that fast serving experience for users.
K.L.: Can you talk more about the software?
J.P.: There are basically three components. The first part is a system called Haystack that actually runs and stores and manages all the data on a given server. The second component manages a cluster of machines, each one of which is running this software. There’s a middle layer that manages Haystack in a particular cluster of machines. It deals with the actual applications that need to, for example, put photos and videos into the system. We do all the heavy lifting behind the scenes so that users can just shove an object into the system or get an object or delete an object. The third and final component of our software focuses on how to actually serve the objects into different devices.
K.L.: Can you talk more about the particular challenges of your application?
Figure 1: Facebook data center at Forest City, NC.
J.P.: The expectation from our users is that we don’t lose the photos and memories they’ve stored in Facebook to share with friends and their family. Over time, though, the active patterns of the photos change. When I upload a picture of my daughter over the weekend, everybody gets to see it, they like it, they comment on it, whatever. A year from now, most likely my friends are not going to go back and see that photo. From a distribution perspective it makes sense that there should be fewer views of it.
We can take some flash or a different grade of flash and use it to store these photos that we can’t lose, but we don’t want to pay the same amount of money as for storing the hot photos. At the same time, we want to consume less energy storing them without sacrificing the latency to retrieve the photo. That quickly eliminates the option of putting these photos on tape or printing them off and putting them in a file cabinet somewhere because the latency to retrieve them doesn’t work for us.