How reliable is the Cloud?

Recently InfoWorld published a list of 10 worst cloud outages which happened in last 3 years. The list included all the big name like Amazon, Google, Microsoft, Rackspace etc. The focus of the post is to draw lessons from these failures. However, for an objective assessment, we need to ask the questions:

  • How reliable are these cloud services?
  • Are they more or less reliable than your in-premise application?
  • Should reliability be measured in the same way for IaaS & SaaS?

(Here is the link to the article)

How reliable are these cloud services?
According to my calculation the reliability number for these services comes out as follows.

  1. Amazon Web Services = 99.920%
  2. Sidekick = 99.400%
  3. Gmail > 99.999%
  4. Hotmail > 99.999%
  5. Intuit = 99.750%
  6. Microsoft’s BPOS = 99.958%
  7. Salesforce  = 99.996%
  8. Terremark = 99.971%
  9. Pay Pall = 99.983%
  10. Rackspace = 99.958%

How does it compare it what you have?

How do you calculate the reliability?
The answer is not as straight forward as you may think. For example, it is commonly believed that air travel is much safer than road travel. But it depends on how you are measuring the reliability. If you go to the Air safety in the Wikipedia you will notice that three different statistics are provided.

  1. Deaths per billion passenger-journeys: Both Bus (4.3) and Car (40) comes out much safer than Air (117)
  2. Deaths per billion passenger-hours: Though Air (30.8) is safer than Car (130), it is still worse than Bus (11.1)
  3. Deaths per billion passenger-kilometers: Here Air (0.05) is much safer than both Bus (0.4) and Car (3.1)

This is how I have calculated the reliability number.

I have assumed that these are the only failure these services have suffered in last 1000 days. The reliability % is calculated by the following formulae:

% Reliability = 1 – (down days / 1000) * fraction of services or users affected

For AWS:

Down days = 4

Fraction of services or users affected = 1 of the 5 availability zone = 0.2

AWS reliability % = 1 – (4 / 1000) * 0.2 = 0.9992 = 99.92%

For Gmail:

Down days = 4

Fraction of services or users affected = 150,000 out of 200 million users = 0.00075 or 0.075%

AWS reliability % = 1 – (4 / 1000) * 0.00075 = 0.999997 = 99.9997%

Should reliability be measured in the same way for IaaS & SaaS?
As far as reliability is concerned, there is a fundamental difference between IaaS and SaaS.

If you are using IaaS you can take the following measures:

–          Have an alternate DR site

–          Have your own backup of the data

However, when you are using SaaS, neither of these approaches is feasible. Imagine setting up a backup mailing system to cater for Gmail going down! For that matter, can you imagine backing up your data which stored in

So, the reliability standard for SaaS has to be much higher than IaaS.


Research finding – Complete migration to cloud NOT is appealing for large business

This is what I had suspected for a long time. I had suspected that Cloud losing its Value Proposition and Cloud is for Flexibility and NOT for cost saving. However, I did not have research data to support my claim.

Now I have supporting evidence from Byung Chul Tak, Bhuvan Urgaonkar, and Anand Sivasubramainam of the Pennsylvania State University who have published a paper titled “To Move or Not to Move: The Economics of Cloud Computing”. The paper attempts to calculate the economics of moving to the cloud over a horizon of ten years. The tests assumed that hardware and software would be refreshed every four years. The conclusion is interesting:

  1. Complete migration to today’s cloud is appealing only for small/stagnant businesses/organizations,
  2. Vertical partitioning options are expensive due to high costs of data transfer, and
  3. Horizontal partitioning options can offer the best of in-house and cloud deployment for certain applications

“Vertically partitioning” = systems in which some of the software (such as application servers) is run in-house, while other programs (such as databases) are run in the cloud.

“Horizontally partitioning” = systems in which all the software is run in-house, though additional copies could be run in the cloud to meet peak demand.

What is interesting to note is that the study does not take all costs of a cloud migration into account. Many costs can’t be quantified, such as the cost of rewriting applications for the cloud, or the cost of retraining IT help to manage the cloud. As a result, the researchers did not factor these costs into their analysis.

So, if those costs are added, migration to cloud becomes further uneconomical.

Is it Blasphemous to Criticize Agile?

This is how I was planning to start my post …

“…believe me … I do think agile works … in most … well if not in most cases then in many cases. That by implication means it is not a silver bullet for all situation. In fact, no technique, no methodology, no solution can be applied all situations and agile is no exception.

However, I have noticed a tendency among agilist to proclaim that if agile has not worked in a specific situation then the fault lies squarely on improper usage rather than any limitation of agile. All you have to do is apply agile properly and it would work. Who is to say what proper agile is? Agile manifesto does not help in determining what makes agile agile. Conversely, how do we judge what agile is not…”

…but then I saw this – Agile’s Teenage Crisis?. It provides 20 point criticism of Agile and how it is practiced. It is nice to know that the thought leaders are thinking about the same issues.

I don’t want to list the points – you better read the post.

However, there is one point that has been bothering me for some time and it is not fully addressed in the 20 points mentioned.

The point is about “Continuous delivery of valuable software”. Can we compare this to “CEO working to create value for shareholder”?

Do you see any similarity?

Both focus on the short term. CEO focuses on the next quarter result and its impact on stock price. They assume that anyway on the long term everybody is dead. This post nicely summarizes the dilemma of the CEO.

However, visionary CEO’s will always balance the short term and the long term.

Agile developers concentrate only on the current sprint and assume that if there is any architectural problem later code refactoring can take care of it. Big upfront design is frowned upon. Similarly if the agile team has experienced developers who can visualize the impact of every design decision things generally work out.

However, are all developers all so experienced that they can intuitively figure out what the right design is? How about the need to debate, introspect and visualize the impact of any design decision? Can it be done it your focus is always on delivering the next user story and how much business value it delivers?

Quoting Philippe Kruchten from Agile’s Teenage Crisis?

“…While agile practices could help control technical debt, they are also often at the root, the cause of massive technical debt…”

Which of the 12 principles behind the Agile Manifesto do you think is most important?

Cloud is for Flexibility and NOT for cost saving

Sure – you might save cost by moving to cloud. There are these five instances where you can achieve cost saving:

  1. You have a compute intensive application needs to run once in a while.
  2. Size of your organization is small to medium where renting works out better for you.
  3. You are planning to revamp your CRM or Email solution.
  4. You expect large number of tablets (iPad and others) getting used in your organization and you will need an alternate to Microsoft Office.
  5. Your data center requires an overhaul where you can leverage virtualization / private cloud technologies.

However, if you approach the problem of moving to the cloud by looking at your application portfolio and attempting an analysis of “ROI vs. Risk vs. Effort”, you are not likely to reach anywhere. You will conclude that most application cannot be moved.

Wearing the Flexibility Cap

Or you might want to say – looking at the application portfolio through the flexibility lenses – you will see a different picture. Since, in cloud, the cycle time to add on remove computing power is close to zero, you will be able to evolve a different decision making style.

  1. You can defer all decision on how much computing power is required.
  2. As a startup you don’t have to worry about under investing or over investing on hardware.
  3. You don’t have to worry about sudden increase in transaction volume.
  4. Trial and error, gradual rollout becomes much easier to handle.
  5. On course correction becomes much simpler.
  6. Upgrades and release management becomes less of a risk because of no down time and easy fall-back option.
  7. DR planning becomes straight forward.

Is an Application Cloud Ready? Answer 4 simple questions

  1. Will I gain from the flexibility?
  2. Are there any technical hurdles for hosting the application in the cloud?
  3. Are the security risks acceptable?
  4. Are there any compliance issues?

If you answer “Yes” to the first question and “No” to the others, then just go ahead and move the application to the cloud.

Amazon EC2 – How much has it changed in last 18 months?

Is there a significant change in the pricing? The answer is almost none – there slight decrease in the charge of higher end EC2 instances. You can see the summary status as on January 2010.

So, what has actually changed?

  1. Free instance – 750 hours of micro instance with 10GB of storage free and 15GB bandwidth
  2. Micro instance – at quarter the rate of small instance (USD 0.02 per hour for Linux)
  3. More availability zone – Singapore & Tokyo added in addition to US(N. Virginia, N. California) and EU(Ireland)
  4. RDS for Oracle – charges are 50% more than My-SQL for smaller instances and 30% more for larger instances
  5. Cluster computing instances – available for USD 1.6 to 2.1 per hour, only at US – Virginia
  6. Elastic Map Reduce – uses Hadoop and EC2 instance type of your choice. In addition to the EC2 instance price and the storage price, you also need to pay for Elastic Map Reduce for each instance, which can be between USD 0.015 per hour to 0.42 per hour.
  7. Virtual Private Cloud – lets you provision a private, isolated section where you can launch AWS resources in a virtual network that you define
  8. PaaS offering – using Java & Tomcat stack
  9. More manageability options – like cloud watch, auto scaling etc.

Free instance

Micro Instance

Micro Instance 613 MB of memory, up to 2 ECUs (for short periodic bursts), EBS storage only, 32-bit or 64-bit platform.

More Manageability Option

Auto Scaling – It allows you to scale your capacity up or down automatically according to conditions you define. For example, you can ensure that the number of Amazon EC2 instances you’re using increases during demand spikes and decreases during demand lulls. It is enabled by Amazon CloudWatch and available at no additional charge beyond Amazon CloudWatch fees.

CloudWatch – It provides monitoring for AWS cloud resources and the applications customers run on AWS. Developers and system administrators can use it to collect metrics and monitors resources. Basic Monitoring metrics (at five-minute frequency) for Amazon EC2 instances are free of charge, as are all metrics for Amazon EBS volumes, Elastic Load Balancers, and Amazon RDS DB instances. New and existing customers also receive 10 metrics (applicable to Detailed Monitoring for Amazon EC2 instances or Custom Metrics), 10 alarms, and 1 million API requests each month at no additional charge. Detailed monitoring is charged.

How IBM Watson has influenced Google?

Try searching for “The capital of Oman”. What do you expect to see? Couple of sponsored link followed by ten resultant pages – right?

Wrong. You will see the following.

When did Google introduce this feature? It was done quietly. There is a news item on 23rd March, 2011 in Realwebseo talking about this feature.

When did IBM Watson win Jeopardy! – 16th February, 2011.

Do you see the connection?

In the post How intelligent are the Computers of 2011 I had asked the question “Why has Google not attempted something like this?” Now we know that the option (2) to be right answer.

This feature is still at its infancy. For example (as on 11th June, 2011) if you type in “what is the capital of India” you get the right answer. However, if you search for “The Capital of India is” you don’t get the best guess.

On the other hand if you search for “The Capital of Canada is”, you still get the correct best guess.

Here are some more results of my fiddling with this feature:

Search Term Best Guess
What is the release date of harry potter and the deathly hallows part 2 Harry Potter and the Deathly Hallows: Part 2 Release Date is July 15, 2011
what is the release date of harry potter and the deathly hallows Best guess for Harry Potter and the Deathly Hallows Release Date is November 19, 2010
who is the winner of 2011 French open tennis (no best guess)
where will the 2012 Olympic games be held (no best guess)
who is the president of USA Barack Obama
who is the president of India (no best guess)
who is the prime minister of India Manmohan Singh
who is the prime minister of Pakistan Shaukat Aziz
who is the prime minister of UK (no best guess)
who is the prime minister of united kingdom David Cameron