Friday, March 06, 2009

Emails Are Forever

I don't know about you but the way I access my emails is almost always through the Gmail search box, whether I'm looking for that email about someone's flashy new job (since I forgot where they are working) or my todo list from last week.  And that's primarily because of the amount and nature of data in my Gmail.  All the communication of any consequence is reflected in emails (including facebook, orkut scraps), and additionally I've moved a lot of note taking to Gmail as well - implemented by sending an email to myself.

With so much of my life depending on my online existence, I'm getting a bit paranoid about several what if scenarios, like what if the online servicing maintaining my data goes bankrupt, etc.  Of course my case is not unique.  Millions of people worldwide have a lot of precious data sitting on computers spread out across the world, owned by a foreign corporation, and supported free of charge (e.g., free Gmail is partially supported by Google's online ads revenue).  But assuming some of this data will remain as valuable to me a decade or two later, I'd like to ensure that I don't leave all of it to the vagaries of a ticker symbol.

Maybe you would argue that Google and Yahoo and Microsoft are not going away anytime soon and I believe you.  But less than 20% of firms that existed two decades back are still around, so that increases the odds against your hypothesis.  Another possible argument is that a product getting phased out doesn't necessarily imply that user data will be lost (e.g., Yahoo photos to Flickr), maybe user data will become all the more precious over time.  Still, I'd argue that the expected lifetime of your present online data will only go down over time.  So the big question is do you care enough about _all_ that data or not.  In my case, I care about most of it.

A poor man's solution is to immortalize your data is to periodically "download" and archive it at your home or better still at another online service.  Of course, switching services may affect the usability of the data, which affects its value to you.  E.g., if I zipped all my Gmail into one single archive, I can't search freely like before.  It may still be useful for litigation purposes but not for looking up my roommate's phone number.  This brings us to the first rule:

1.  When archiving your data, the functionality supported by the archived data should be comparable to the primary copy's functionality.  If not, it will probably be forgotten.

A more difficult problem for me is figuring out _where_ to archive or move my data, making the decision, verifying that it worked, etc.  This is potentially very time consuming.  So what I'd want to see is a computer "program" that will keep track of the service level trends for your current data provider, monitors/evaluates upcoming alternative services, and replicates your data across a "diverse portfolio" of service providers, so over time your data is likely to survive somewhere - and almost always it can be found on the new and upcoming service.

It may seem that I just borrowed a dialog from Star Trek and I can understand your sentiment.  Some key components that will be needed to create such a "program" include:
  • A service for evaluating and recommending data service providers.
  • A data API supported by different data service providers that enables easy migration between them.  For example, if you must migrate a photo sharing service to an email provider, you'd probably store a single photo album as a single email.  Someone will have to write such interfaces for all new services.
Before I conclude, I'd like to observe that this problem is similar to money management.  When you give your money to a money management firm, you expect them to keep reinvesting it in the most appropriate business presently.  Maybe that is too complex to delegate fully to a program, but I think keeping your data up and running forever should be simpler.