I was chatting recently with a coworker who had just returned from a DevOpsy-focused conference. She mentioned she had met a woman whose entire role was focused on finding “lost” cloud instances (that is, virtual servers running in a public or private cloud network). Her entire job is just to find those instances and get them shut down. Which implies, of course, that this is a significant enough problem that it requires a dedicated full-time role to manage.
Based on data from RightScale, a cloud management solution vendor, and Cloud Cruiser, a cloud-based financial management company, that’s a fair assumption. The latest data from RightScale1 notes that respondents estimate 30% waste of cloud spend while RightScale’s measured waste rings in a bit higher, between 30% and 45%. When respondents to a Cloud Cruiser survey2 were asked how they “proactively manage public cloud usage and spend today,” a staggering 31% shrugged and replied with “we don’t.”
Cloud sprawl is a thing, ya’ll, and it’s a bigger problem than just the strain it places on budgets. It’s a risk to the health and well-being of your organization for two huge reasons.
First, if you don’t know the instances are out there, you have no idea as to the state of the software running on them. What applications? What platforms? How many potential attack vectors are those instances introducing by hanging out with no one proactively managing their state?
Given that WhiteHat Security’s seminal annual report on web security3 found “about one third of Insurance applications, about 40% of Banking & Financial Services applications, about half of Healthcare and Retail applications, and more than half of Manufacturing, Food & Beverage, and IT applications are always vulnerable,” the chances are pretty good that an unmanaged instance in the cloud is ripe for the picking. And, if someone were mucking around trying to gain access, would you notice? If you aren’t managing that instance, chances are you aren’t monitoring, either. And getting in is the first step to gaining access to other instances.
The second reason these rogue instances are a risk is the potential for data leakage.
I know, that sounds ridiculous, but bear with me.
At some point, developers need to test on real data. Back in the day, when I was a wee lass and still coding, production (real) data was replicated for us on a regular basis. That still happens today, based on data from Delphix.4 In fact 80% of organizations admit to using and storing production (real, live customer data) in the dev/test environments. Even scarier was the finding that “a staggering 72% of DevOps Leaders noted that development and QA has access to production and that access is not audited.”
Now, we know that a significant number of organizations encourage dev/test in public cloud environments. Which means that, based on the numbers above, it’s likely you have production (real, live customer) data in the cloud. If a percentage of those instances are “lost,” you now have not only the cost of rogue instances running over time to bear, but you also have the risk of system exploitation (cause, not patched or managed), in addition to the possibility that real customer data might be lost. Which is a Very Bad Thing™.
The point of all the maths is that rogue instances are a risk to the organization. They incur needless costs that chew up budget, which is bad enough on its own. But they present additional unnecessary risk. Cloud sprawl isn’t just annoying or a byproduct of doing business in the cloud, it’s an issue that needs to be reeled in and better managed to eliminate risks that shouldn’t exist, anyway.
Security isn’t generally a driver of efforts to rein in cloud spend by eliminating cloud sprawl, but perhaps it should be.