AWS outage, CSP costing, performance improvement, and analysis and reduction of bandwidth cost (& overall CSP Cost) ………..
Not sure how many heard or read about the Amazon Web Services (AWS) outage which happened in June due to a Friday night storm. It knocked out Netflix, Pinterest and Instagram services for many users in the eastern United States and was for more than three hours. For Netflix in particular, Friday evening is a peak demand time, so customers could not have been too happy about the outage.
When running your applications on such a public cloud (like AWS) you need to be make sure how to architecture your application so that you can mitigate this (multi-zone, multi-region, multi-provider etc) plus you can do several small things which not only increases your performance, cuts your cost drastically but also helps to do de-risk & mitigate such outages. Improving performance also may improve associated aspects of the application, including availability, resistance to disaster and, most importantly, the cost of using the public cloud.
AWS and other such public cloud service provider (CSP) providers charges not only for computes, and storage but also network bandwidth used and also charges every time you access your storage for a read or a write. As a result, you may want to gather up reads and writes in your application and bunch them into single operations wherever possible. That way, once you have spent the money on your own servers, you don't incur additional costs every time you do a read or write operation. (We at i7 are more interested in analyzing your network traffic and optimizing that reduces your network bandwidth cost to a great extent (can be 40% to 60%)).
The overall effect of this cloud optimization technique depends upon the pricing methodologies of the public cloud service provider (CSP) you sign up to use. Irrespective of which CSP you sign up with, however, re-factoring can be seen as an opportunity to improve application performance and cost reduction and in a way de-risking such outages too.
Not just optimization, cost reduction and de-risking, you can use tools to define your own service level profile. Every application can have a different service level profile. Your customer-facing e-commerce site will have a different service level compared to your internal employee portal. Evaluating the costs of public cloud instances against these various service levels needed for various applications may help you optimize their public cloud costs. (Especially the network bandwidth part which we are interested as our i7 EagleEye provides a complete analytics of the bandwidth usage of any given application which in turn can be used to optimize and cut costs)
Coming back to the June 29 Netflix outage, given the nature of the video streaming service, putting other Amazon's data centers elsewhere in the country to action may not have been feasible, given the storage and bandwidth-intensive nature of Netflix. But it always helps for such bandwidth intensive cloud application/service providers to get a bandwidth health check done and cut down all unnecessary bandwidth out/in flow and fine tune to the optimum usage.
Basically applications that are hosted on the cloud and that use CSP for such a service, will have varying compute, storage and bandwidth needs. Your rules need to be based on a complex combination of these three factors. You have to experiment with combinations of all these three that look logical for your public cloud applications and the service levels they need. There are tools available to optimize these percentages (i7 Networks’s EagleEye can help you optimize the bandwidth usage).
To summarize, when you move your application to the public cloud, it may work very well as it is, without any changes. However, if you pay attention to how your CSP charges and the context of your application's pattern of compute, memory, storage and network bandwidth usage, you can easily reduce your public cloud charges. Optimizing the application itself with some re-factoring may improve its performance, better its de-risk capabilities and elongate its life, while experimenting with and fine-tuning will surely help you further lower CSP costs.