Wow, that's quite a title! You're already probably queuing up all of your counterpoints and your rebuttals. In fact it's not quite that serious, but a seriously worrying trend is emerging that I'd like to address.
The Problem
Today many sites are so totally dependent on 3rd-party services that when certain services go down, a chain of outages end up knocking out many of the sites that we use on a daily basis--some for critical business applications. But I'm being too abstract, so let's take a concrete example of this: Google Analytics.
Back in late 2007, Google's analytics and monitoring service went down with no explanation. Most sites at that time had installed a line of Javascript in their HTML head element which made a call to document.write. This causes the rest of the site to stop and wait for Google's servers. Normally this is not a bad thing, because Google has some pretty fast and reliable web servers. But in this 24 hour period, anyone who had this code in their head element had an absolutely broken site. Users could not see their site at all. And was no fault of either the site developers or of the users--just Google's fault.
Another example: Earlier this year, Amazon had an outage in their popular S3 file storage service for several hours. At the time, you didn't need to be very tech savvy or know much about computers to know that something was seroiusly wrong with the internet. Sites from across the net were throwing 500 errors, looking completely awful without their media files, and the internet simply became a pretty awful place to get things done. From one company. Having a problem with one service.
And since then we have become even more dependent on 3rd party services for even more widgets, "cloud computing", and more. Frankly this "cloud computing" craze scares the hell out of me. The more interconnected our various bits of HTML and HTTP are, the more chances there are for massive catastrophe. Just look at the credit default swaps problem we're having in the USA for another concrete example of how this type of interdependence can fail in catastrophic ways.
The Solution
Services like S3, Google Analytics, and even Twitter are great services. They add lots of value for larger businesses and even more for a startup, so there's a large incentive to use them. I think that's absolutely fine and is actually a good idea. That being said, we need to manage our use of these services in a responsible way. Instead of storing data directly to S3, store it on a server and asynchronously upload it to S3. That way, you can set up an S3-pinger and if it goes down you can have the server automatically switch to serving the media itself.
We need to build standardized tools that fetch data from webservices locally, from which they are served to the user. We need to build systems that asynchronously sync data bidirectionally from all of these different webservers and ensure that the integrity of our data on the web is sound. Right now this is a tedious, and error-prone task, but we can do better. We can build cross-platform tools and libraries that will solve this problem, allow us to use 3rd-party services, and rest sound knowing that tomorrow no matter what happens to Amazon, the internet will still be around.
DISCLAIMER: This is almost entirely a ripoff of a talk given by Timothy Fitz at Super Happy Dev House last month in San Francisco. While I think it's a really good point I can't take credit for having been the first to worry about it.
All Content


cloud buzz warning! The whole idea of putting media on s3 is that its supposed to be available all the time with its cloud... though you could use different clouds if you are worried about not being able to browse your web a few hours a year :P
Not using cloud services isn't exactly without headaches. Regular ISPs have downtime too.
That's certainly true, but then it's contained. However, if that ISP hosts some widget that someone else uses and that widget blocks rendering of said page, then we're back in the same ugly situation that I'm talking about.
It might be better for the whole internet to be more segmented and have less single points of failures, but if my ISP is down more often than such a single point of failure, it's bad for my business to use the ISP.
On top of that, there are a lot of problems that S3 solves that aren't solved by using a local server indirection: when you store something in S3, it's guaranteed to be redundantly stored. If you cache it on a local server, and the local server crashes before the data being uploaded to S3, you have data loss without the end user being able to retry. And there are a lot more things like that.
There is a reason services like S3 have such an impact when they are down: they are popular. And there's a reason they're popular too: they solve a lot of problems. Until someone comes up with a completely transparent local caching service, my money's on amazon for making S3 more reliable rather than rolling out my own workaround.
Wow, now that is a scary thought.
Jiff
www.anolite.echoz.com
This is one of the reasons I host everything really important to me myself: website, personal email, offsite backups: yes, any of the places where I have machines can go down temporarily, but at least I can do something about it when it does happen.
Helplessly standing by as something breaks that either I need or someone who I work for needs is not a good feeling.
...but as more people do use S3 etc they will get more reliable--to the point where they consistently outperform shit hosting.
so basically you are fucking retarded
die in a fire
Host your own data! And why not build your own electrical substations and groundwater pumps? It's the only way to be *sure*.
But what if your ISP catches fire and the walls come crashing down around your servers??/
;)
Yes, but what happens when a data center gets blown up or catches fire. Is every single bit of data replicated multiple times? The more replication going on, the more expensive it is, so I doubt it's nearly as replicated as we'd all like to think. Thus another misconception of the reliability of the cloud I'm afraid.
Sorry, meant the data center that is part of a cloud in the above statement.
I would have to agree with you on this one. I was using Amazon S3 and it always seemed like some of my media files would hang while loading. Then it went down for almost an entire day and I have not looked back since.
If the media is down on my site, I would rather not have my site accessible at all. Although it does sound like a good idea to use both S3 and another server.
I realize I am a little late to the party, but I have a couple of quick notes. First, most of my reservations about the longevity of S3 disappeared when Amazon announced their SLA.
Second, I've found S3 to be a fantastic backup solution. Even if Amazon were to go under or shutdown S3, the only loss are backups of data that I already have locally (Thanks ZFS snapshots!).
My S3 backup flavor of choice is tarsnap (www.tarsnap.com) as Colin Percival is a probably the single most qualified guy to make such a system. It is not a backup system for the faint of heart, but if you feel comfortable in the unix shell it is a great solution.
Yes, but what happens when a data center gets blown up or catches fire. Is every single bit of data replicated multiple times? The more replication going on, the more expensive it is, so I doubt it's nearly as replicated as we'd all like to think. Thus another misconception of the reliability of the cloud I'm afraid.
Cool! I might try it out for use on Quisition and possibly jtauber.com too.
yeah!!
I laughed:
http://www.dmclaughlin.com/images/collapse.png
I worry to no end about the broader implications of service consolidation. Both because of the privacy concerns but beacause it's a betrayal to the distributed uncoupled nature of the Internet, or at-least the foundation of the idea.
I use GMail because it's very convenient and I like searching but I'm constantly worried about my independance and liberty. What if they sell me out, or go under or get hacked and so on and so on. What if 90% of the world used just GMail, Hotmail and Yahoo Mail? That's just 3 organizations that governments need to control in order to monitor 90% of the population. be it the US government or the Chinese government it doesn't matter. When many small organizations all run their own servers liberty and privacy as a whole wins.