Recently InfoWorld published a list of 10 worst cloud outages which happened in last 3 years. The list included all the big name like Amazon, Google, Microsoft, Rackspace etc. The focus of the post is to draw lessons from these failures. However, for an objective assessment, we need to ask the questions:
- How reliable are these cloud services?
- Are they more or less reliable than your in-premise application?
- Should reliability be measured in the same way for IaaS & SaaS?
(Here is the link to the article)
How reliable are these cloud services?
According to my calculation the reliability number for these services comes out as follows.
- Amazon Web Services = 99.920%
- Sidekick = 99.400%
- Gmail > 99.999%
- Hotmail > 99.999%
- Intuit = 99.750%
- Microsoft’s BPOS = 99.958%
- Salesforce = 99.996%
- Terremark = 99.971%
- Pay Pall = 99.983%
- Rackspace = 99.958%
How does it compare it what you have?
How do you calculate the reliability?
The answer is not as straight forward as you may think. For example, it is commonly believed that air travel is much safer than road travel. But it depends on how you are measuring the reliability. If you go to the Air safety in the Wikipedia you will notice that three different statistics are provided.
- Deaths per billion passenger-journeys: Both Bus (4.3) and Car (40) comes out much safer than Air (117)
- Deaths per billion passenger-hours: Though Air (30.8) is safer than Car (130), it is still worse than Bus (11.1)
- Deaths per billion passenger-kilometers: Here Air (0.05) is much safer than both Bus (0.4) and Car (3.1)
This is how I have calculated the reliability number.
I have assumed that these are the only failure these services have suffered in last 1000 days. The reliability % is calculated by the following formulae:
% Reliability = 1 – (down days / 1000) * fraction of services or users affected
Down days = 4
Fraction of services or users affected = 1 of the 5 availability zone = 0.2
AWS reliability % = 1 – (4 / 1000) * 0.2 = 0.9992 = 99.92%
Down days = 4
Fraction of services or users affected = 150,000 out of 200 million users = 0.00075 or 0.075%
AWS reliability % = 1 – (4 / 1000) * 0.00075 = 0.999997 = 99.9997%
Should reliability be measured in the same way for IaaS & SaaS?
As far as reliability is concerned, there is a fundamental difference between IaaS and SaaS.
If you are using IaaS you can take the following measures:
– Have an alternate DR site
– Have your own backup of the data
However, when you are using SaaS, neither of these approaches is feasible. Imagine setting up a backup mailing system to cater for Gmail going down! For that matter, can you imagine backing up your data which stored in Salesforce.com.
So, the reliability standard for SaaS has to be much higher than IaaS.