Wow, that's quite a title! You're already probably queuing up all of your counterpoints and your rebuttals. In fact it's not quite that serious, but a seriously worrying trend is emerging that I'd like to address.
The Problem
Today many sites are so totally dependent on 3rd-party services that when certain services go down, a chain of outages end up knocking out many of the sites that we use on a daily basis--some for critical business applications. But I'm being too abstract, so let's take a concrete example of this: Google Analytics.
Back in late 2007, Google's analytics and monitoring service went down with no explanation. Most sites at that time had installed a line of Javascript in their HTML head element which made a call to document.write. This causes the rest of the site to stop and wait for Google's servers. Normally this is not a bad thing, because Google has some pretty fast and reliable web servers. But in this 24 hour period, anyone who had this code in their head element had an absolutely broken site. Users could not see their site at all. And was no fault of either the site developers or of the users--just Google's fault.
Another example: Earlier this year, Amazon had an outage in their popular S3 file storage service for several hours. At the time, you didn't need to be very tech savvy or know much about computers to know that something was seroiusly wrong with the internet. Sites from across the net were throwing 500 errors, looking completely awful without their media files, and the internet simply became a pretty awful place to get things done. From one company. Having a problem with one service.
And since then we have become even more dependent on 3rd party services for even more widgets, "cloud computing", and more. Frankly this "cloud computing" craze scares the hell out of me. The more interconnected our various bits of HTML and HTTP are, the more chances there are for massive catastrophe. Just look at the credit default swaps problem we're having in the USA for another concrete example of how this type of interdependence can fail in catastrophic ways.
The Solution
Services like S3, Google Analytics, and even Twitter are great services. They add lots of value for larger businesses and even more for a startup, so there's a large incentive to use them. I think that's absolutely fine and is actually a good idea. That being said, we need to manage our use of these services in a responsible way. Instead of storing data directly to S3, store it on a server and asynchronously upload it to S3. That way, you can set up an S3-pinger and if it goes down you can have the server automatically switch to serving the media itself.
We need to build standardized tools that fetch data from webservices locally, from which they are served to the user. We need to build systems that asynchronously sync data bidirectionally from all of these different webservers and ensure that the integrity of our data on the web is sound. Right now this is a tedious, and error-prone task, but we can do better. We can build cross-platform tools and libraries that will solve this problem, allow us to use 3rd-party services, and rest sound knowing that tomorrow no matter what happens to Amazon, the internet will still be around.
DISCLAIMER: This is almost entirely a ripoff of a talk given by Timothy Fitz at Super Happy Dev House last month in San Francisco. While I think it's a really good point I can't take credit for having been the first to worry about it.
I don't know how Google manages to do it! They've linked music track/artist/album to online music vendors, a deceivingly simple-sounding task which has been the bane of my existence for the last two weeks.
Let's start with two of the big players:
iTunes
iTunes has a terrible API: You go to iTunes Link Maker and type in the keywords of your choice. You can come up with a programmatic way to access this but it involves parsing invalid HTML.
Amazon
Amazon's API is a good programmatic interface to its own internal service, but unfortunately that service is TERRIBLE. Take this, for example: a CD from a wonderful band called Rodrigo y Gabriela. One of the songs on it is named "Ixtapa", so let's do a search for "Ixtapa Rodrigo y Gabriela." Makes sense, right? Apparently not to Amazon's search. In fact, amazon's search really has no concept of the song. It's not something that they sell. Amazon knows all about album titles and artists, but not about the song itself. This renders it useless for linking song to vendor!
But what about open source?
MusicBrainz
So the next thing that I try is using MusicBrainz, an open source database of music metadata, which seems to have Amazon.com linking information built right in. However, reading through the wiki, for hours, and hours, and hours, is not fun. They have bits and pieces from all different times of the project's lifecycle, most of which is irrelevant. But after a while I find out that to get access to the web services, you either need to limit your requests to 1 per second (unacceptable in my case), or you have to set up your own MusicBrainz server. OK, let's do that!
Oh wait, they forgot to mention that it's the most rediculous dependency-ridden piece of bloatware ever. It's so bad that they don't even really have a guide on how to set it up--they've given up and just created a virtual machine for people to download. OK, well fine, let's download that and go from there. What's this? There's more setup? Apparently so, because I had to leave my computer to import data and compute indexes for 3 days straight.
Finally, finally, I am ready to start accessing that music -> amazon.com data, when I notice something: Non-Commercial license. After all of this, the AMAZON SPECIFIC PORTION ONLY is licensed differently, and I cannot use it. I am disappointed with this service, to say the least. MusicBrainz needs a major overhaul in its software dependencies (Hint: Use Python, it's got batteries included.) It also needs to take a serious look at its licensing scheme. If it can address these two things, it will be much further along in its goal to make a great community database.
Information has a long way to go. Music metadata and the ability to link to different music vendors should be ubiquitous and available in a standard way. Nobody is benefiting by putting a lock and key on this sort of data. The people who really lose, in the end, is the music vendors who get ultimately less sales. Hopefully someday soon they see the light, and fight to make this information accessible.
All Content

