Learn how these “infrastructure vampires” can be negatively affecting your business – and what you can do to avoid it in the future
The term “VM sprawl” refers to the uncontrolled growth of virtual machines within a virtualized or cloud environment. This problem can occur anywhere, from the largest companies down to small mom and pop operations, and if ignored, can end up costing valuable time, money and resources. Since VM sprawl happens gradually over time, it can be difficult to realize how pervasive the problem has become until it is out of control.
The ease with which virtual machines can be instantiated today, whether through manual creation or automated APIs, has led to the problem of too many VMs in an organization to be managed manually in an effective manner.
Unmanaged Virtual Machines: The Burden on Business
Lingering and unused virtual machines cause two major issues within IT infrastructures:
- Drain on resources – VMs consume Disk Storage, RAM and CPU wasting resources and result in unnecessary expansion of very expensive IT infrastructure, involving not only networking, servers and storage, but also datacenter power, space and cooling.
- Threat to cyber security – Left unmonitored and unmaintained, VMs are extremely vulnerable to attack.
Some of these VMs become orphaned as part of development and testing activities, and they may have incorrect or inappropriate amounts of resources apportioned. Virtual machines can be inadvertently created with enormous amounts of storage capacity, but when they’re left sitting idle, they leave very little resources for other VMs.
For example, an Ansible Github YAML file describing a build for a DB server role with an EXT4 partition might inadvertently stipulate the wrong partition size, turning an 8GB partition into 8TB. If left unnoticed, such a situation could cause unexpected results by consuming most, if not all, of the available shared storage. In addition, orphaned VMs are often allocated tremendous amounts of RAM, which leads to serious over-subscription of memory resources.
VMs that are not properly monitored or maintained with timely security patches can become catastrophic security concerns. Many packaged ISOs for virtual machines contain default credentials, and care must be taken to change these default credentials before the VM is ever exposed to malicious traffic. Once a VM is compromised, attackers can gain tremendous insight into your internal network, making properly patched systems a lot more vulnerable to attack from inside your infrastructure.
Proper monitoring of the VM and hypervisor resources is critical, but in a sprawling environment, this will almost certainly require a framework of automation and orchestration for management. Complicating matters further, IT teams are often terrified of deleting virtual instances in complex environments as one can never be certain of any dependencies they would be breaking.
3 Automation Must Haves for Managing VM Sprawl
Software is a defined aspect of everything, and some virtual machines can be described as “pets” and some as “cattle.” The pets will get loved and nurtured all the time and used daily while the cattle should only be alive long enough to serve their purpose before being eliminated. Unfortunately, keeping track of, assessing and removing these excess virtual machines by hand takes considerable time and budget.
A key counter-measure to the burdensome management of orphaned and potentially insecure VMs is to develop an automated system of checks and balances beyond resource management using Nagios, Cacti or SNMP. There are three key considerations to creating and implementing an automation framework:
- The transport network infrastructure must handle new management traffic, from basic routine tasks to more complex projects.
- The management traffic must be segmented and separated virtually or physically.
- There must be a repository to hold the software that’s defining your network that communicates over the SDN network.
This automated system should include scripted methods that perform logical checks on the VMs such as:
- Power status – Is the VM off or on?
- Tenant identification – Is the VM part of a Dev, Test, CI/CD, Staging or Production activity?
- Spawn date/time – How long ago was the VM created?
- Uptime – How long has the VM been powered on?
- Current users – Who is logged in and for how long?
- Current connection count – How busy is the VM?
- Last login time – Does anyone ever log in to the VM?
- Rudimentary dictionary password cracking for weak passwords – Were any default passwords left unchanged?
- VM Role – What ports are open? Is there a service running?
The manner in which you manage your network can be as rudimentary or as complex as you want, but with proper automated monitoring you can increase the efficiency of mitigating the associated issues of resource consumption and cyber security even in a large cloud or virtual environment.
An Ounce of Prevention: Stopping VM Sprawl Before It Starts
Stopping VM sprawl before it starts requires putting smart processes in place from the start. Because the management of virtual machines is largely automated, any wrong information can have huge impacts on your infrastructure. Consequently, restricting access is a key practice albeit an unpopular one. Not providing access to the IT staff tasked with fixing a problem makes their job more complicated, but it is crucial that unauthorized users aren’t able to make changes. That way, once you test, the process should always work.
Automation overwrites rather than local overwrites ensures the process wipes out potential errors implemented by unauthorized users. As for dependencies and accidentally deleting something important, the most important rule is quite simple: always remember to back up (this should be part of your SLA). Ultimately, it shouldn’t matter if your virtual machine gets deleted, because backups should be a standard practice.
In its simplest form, managing and reducing VM sprawl is as easy as keeping a close eye on your virtual environment and always having a handle on what’s going on behind the scenes. By implementing a system of rules for VM creation, limiting access to your team, and monitoring VM lifecycles and usage, you can take strides to avoiding this expensive and unnecessary problem.
Find out about how VIMRO can help keep your team operating at peak efficiency HERE>>>