I was reading an article in Macworld (see Gmail outage caused by overloaded servers) about the recent outage of Google’s Gmail. The outage, which lasted for 100 minutes, was a result of changes on request routers that overloaded the system after workers took some Gmail servers offline to perform routine upgrades.
Outages such as this will cast doubts on the reliability of Cloud computing. As more and more companies move into the Cloud there will be increase demand to maintain 24×7 uptime. As any first year sys admin knows, networks need to be maintained and they can only be maintained during off-peak hours. If you are running a global 24×7 operation there really is no off-peak window to perform maintenance.
Some critics say that a business should have multiple geographic redundant load balanced data-centers spread out all over the world. The cost to do such a solution will be out of scope for most businesses. Management and coordination of maintenance and upgrades will be a nightmare. If the big-boy on the block, Google, has periodic issues maintaining their level of service what chance does the business down the street have in maintaining uptime in the Cloud?
This we know as a fact – hardware will always fail. It is the nature of the beast. Traffic patterns will always increase beyond estimated projections. There will always be bottlenecks in every network no matter how much you try to compensate for it. How you plan your Cloud solution will determine if you can maintain a 24×7 operation or suffer sporadic outages such as Google. My advice to those of you out there is to plan on how to deal with outages when they occur and they will occur more frequently as more and more businesses move into the clouds.