Avoiding Cloud Lock-in (the Roach Motel Syndrome)
Fortunately, many of the large public cloud services organizations that exist today provide the ability to export not only data but also metadata generated by its subscribers. Any enterprise should seriously consider this as a vital feature to have before adopting any cloud service that could become critical to their business. It could be unrealistic to assume that you will always maintain a service with a particular cloud provider. If there is no mechanism to retrieve your data, then the resulting situation can present a dilemma of costly proportions.
The presence of such a mass export feature isn't the only such requirement. How accessible and usable the data is after it has been exported is also important. If the data is exported in a proprietary file format, then that format might not be able to be intelligibly parsed. If it is exported in a plaintext format, it will have to be imported into the new system (or provider) in some intelligible way as well. As a result, one needs a real understanding of such challenges if you choose to leave a cloud.
Below are some examples of the cloud providers leading the industry in helping to avoid these lock-in concerns:
- Salesforce.com offers its subscribers the ability to generate a complete export of all data within a subscribers instance on an on-demand basis. While some subscription levels include this export feature as a part of the package and others at an additional fee, it is available as an option. The exported data is available in a ZIP file containing plaintext CSV files, which have the raw data for each Salesforce object. This can also be setup in an automated task as well always archiving the data. If you are a subscriber, this feature is accessible under their Web Interface under Setup | Data Management | Data Export | Schedule Export. Also worth mentioning is that at the time of publishing, there are several other alternatives to Salesforce that are able to intelligibly and automatically parse this data, proving that it is indeed useful and not just satisfying a feature checkbox.
- Google has gone so far as creating what they call the Data Liberation Front. An example of this effort in action can be seen in Google Docs. Google Docs can act as a repository for all of users (or organizations) word processing documents, spreadsheets, and more, and as such this naturally should be very portable. Google responded and added a feature that easily allows the exporting of all Google hosted documents in a few clicks. They even went so far as allowing the documented to be exported in multiple different formats as well including Microsoft and Open Office formats. It's worth quoting Google's description of this group and their mission statement for this organized effort is a rarity in the cloud space5:
The Data Liberation Front is an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products. We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to "liberate" their products. This is our mission statement:
Users should be able to control the data they store in any of Google's products. Our team's goal is to make it easier to move data in and out.
- Another example of a public cloud provider helping to lead the way of addressing the lock-in problem is Amazon's Web Services and more specifically their Elastic Compute (EC2) service. The same is also true for their surrounding cloud services for data storage, database computing, and several other services. Their approach to the problem is to offer an import/ export feature that accommodates amounts of data that are not feasible to transfer via a file download on the Internet. Subscribers can prepare a portable hard drive and submit a job to Amazon to perform a data import or export. At that point, the subscribers can physically mail their portable hard drive to an Amazon provided address, and the data migration occurs.
It's also worth mentioning that there are companies being formed solely to address the lock-in issues of other public cloud providers. Backupify (www.backupify.com) is a perfect example of this. Their primary product or service is offering the ability for its subscribers to automatically back up and archive all of the data relating to their cloud services. Today their product has a more consumer focus, supporting the automated backup and archival of data from Facebook, Flickr, Twitter, and Google Docs, but it is only a matter of time for similar companies targeting enterprise cloud services to start addressing the lock in issues for enterprise clouds as well.
The security concerns around storing data in the cloud are not inherently unique compared to data that is stored within the premises of an organization. That is not to say that the risks to data are the same in these very different environments. Ultimately, the concerns can be broken down and addressed in three key areas:
- Identify what data and applications you will store in a non-private and nonhigh- assurance cloud. Knowing what data will exist within a cloud is half the battle. The answer isn't going to be obvious either as additional questions around data provenance will arise in many environments. Also, data that is created or modified by using a cloud will be just as important as the original data itself. Metadata should also be identified and protected. Understanding where it is physically stored and what laws govern it is also important when such data falls under regulatory or legal coverage.
- Avoiding cloud data Lock-in—Make sure you are aware of the options that are available in case you need to move to another cloud provider. If your data stored in a proprietary CSP format or if it cannot be easily be exported or modified for a new environment, you may be subject to lock-in.
- Understand the data protection options you have available and implement a sound strategy for protecting your sensitive or valuable data—Just as when protecting data that is in a traditional IT environment, encryption and authentication are key factors to employ for data that is stored in the cloud. If encryption is being used, understand what kind of encryption and what provisions are in place for key. Understanding how data is deleted and how long it is retained in CSP backups.
Finally, be selective in choosing a CSP. The biggest risks to your data may well reside with the CSP personnel accessing your data or mishandling your data in its various forms. Chapter 6, Securing the Cloud: Key Strategies and Best Practices, will go into some further depth on best practices around cloud security. Later, Chapter 8 (Security Criteria: Selecting an External Cloud Provider) and Chapter 9 (Evaluating Cloud Security: An Information Security Framework) will present criteria and methods for making informed decisions as to how to select an external CSP or how to evaluate the security of an external or internal cloud. Throughout those chapters, data security is a primary focus and concern.
4. Harrison G. Stuck Inside a Cloud, Brainwashed 2002 Parlophone, Dark Horse, Capitol Records (Published posthumously); 2002.
5. http://www.dataliberation.org/ [accessed 22.03.11].
Printed with permission from Syngress Publishing, a division of Elsevier. Copyright 2011. "Securing the Cloud: Cloud Computer Security Techniques and Tactics" by Vic (J.R.) Winkler. For more information about this title and other similar books, please visit www.elsevierdirect.com.
For more articles like this and others related to designing for the embedded Internet, visit Embedded Internet Designline and/or subscribe to the biweekly Embedded Internet newsletter (free registration).