Monday, June 14, 2010

Copy cat web services

Recently I've been thinking off a tangent from a previous post, Emails are Forever. The scenario is something we are all too familiar with -- as users of various Internet services, we are continuously generating a lot of digital content (e.g. emails, pictures, conversations on your Facebook wall, our tweets and tweets of people we follow, etc.). Sometimes we don't care whether some content will still be around few days or weeks after it is "consumed", e.g., random comments you might have left on someone's picture on Facebook. But on other occasions, you really hope that it is, e.g., any pictures you uploaded on Facebook and have no other copies in the "cloud". However, what's the guarantee that your data will be around few weeks, months, or years from now ?

It depends. Some web services are more likely to be around few years from now than others. If you really care about something, you can mainain an offline backup at home or elsewhere in the cloud, e.g., using S3. But there are several issues with backups. Firstly, not all kind of digital content you might have created using a web service can be easily tar'd up. Secondly, for highly dynamic content, such as emails, it's non-trivial to keep the backup in sync with the primary. Thirdly, although the backup is always "around", it's rarely called into service. In some cases it's possible that the data was lost due to bad disks (e.g., if using home storage), or you misplaced your credentials in case of an online service, or maybe someone compromised your online account, etc. I know some people who maintain 4+ copies of all their pictures and actively manage the storage, I cannot imagine doing that. Wouldn't it be nice if all "replicas" of the primary data had same or similar properties as the primary so you could simply use one or the other.

I'll jump to the problem statement now. For each web service, build an alternative service that stores the same user generated content as the primary service, has roughly the same functionality, and offers a high level of durability for the content duplicated from the primary. Reliability of the alternate service, measured in terms of downtime, is perhaps not that important. Durability of the alternate service is not necessarily related to the primary's service, simply stated it should be as durable as possible.

How can this be done ? Honestly, I don't know. However, what I do know is that this should be done in a way so as not to stiffle innovation in creating new web apps and services. Also, it will probably require an open source framework which makes it easy to develop such alternate services. Finally, it's not clear to what extent can the functionality of an existing web service be duplicated without key support from the provider itself. E.g., think of a funky new picture editing and sharing service that allows you to also store your pictures online. Does it make sense to duplicate the entire picture editing feature set in an alternate service simply so that we can store and access all the same data from the primary ?