Cloudburst or spring shower: Musings on Amazon’s outage
Some of you may know about Amazon’s recent troubles with an outage in its eastern region that brought down its cloud services for hundreds (thousands?) of cloud-based web sites, some of them for days. From The Wall Street Journal to the New York Times to minions of bloggers, everybody had a comment on the first major stumble in “cloudom.” Events like this feed the FUD factor (fear, uncertainty and doubt) that follows cloud. However, they can also help mature everyone’s understanding and provide needed perspective.
We at Discovery Health Partners live in the cloud. We run our business infrastructure on Google and other service providers, and our client applications on Amazon Web Services (AWS). We lived through this first major test of its durability and reliability. Our take: Amazon’s cloudburst was a mere spring shower, expected and needed to help cloud flourish.
We lost some productivity in the initial hours after the outage began, and the Discovery Dashboard, an application in our healthcare cost containment platform, encountered an IP address hitch that we were able to fix with an adjustment. Our disaster recovery plan performed as planned, however, and we didn’t lose any data or suffer a breach. In the end, the outage was an inconvenience, not a crisis…and a good learning experience.
Our perspective is this: no IT environment has 100% uptime and cloud is no exception. Cloud users just need to accept this reality and architect disaster recovery plans accordingly. Architecting disaster recovery plans for the cloud means leveraging the capabilities of the cloud provider and considering the limitations and possibilities inherent in the platform. Once that’s understood, we can 1) tame our overblown expectations of cloud as IT nirvana ; 2) stop calling “chicken little” and recognize that our fears that the sky is falling are a bit overblown; and 3) realize that we are pioneering a new way to do business and problem-solving challenges as they arise.
The noisy aftermath of Amazon’s cloudburst was in fact more like a spring shower, forcing us all to pause and reexamine, considering our learnings. We learned how important it is to understand our host and the capabilities they’ve built into the platform to support high availability. We watch with interest as Amazon learns from this experience and figures out ways to help us, their customer, mitigate risk. Most importantly we gained an appreciation for the potential cloud has to reduce risk further than any other platform, reinforcing our decision to use the cloud to run our business. Where else can you extend your disaster recovery across geographies with a mix of strategies at a cost point that beats all alternatives? Our commitment is reaffirmed, and we are more confident than ever that cloud-based solutions will one day dominate the way businesses manage their critical business functions.
–Diann Bilderback, Chief Marketing Officer