With VMware thin provisioning, what's the best process for handling oversubscription?
Storage oversubscription, in a VMware environment, is the process of allocating more logical space than is physically available on a vSphere data store. The use of oversubscription can be a good thing; it allows virtual machines to be built with capacity to grow without downtime and to be built to a standard design without wasting resources. However, when an oversubscribed environment runs out of physical space, problems can be unpredictable, including virtual machine startup failures, snapshot creation failures, poor performance and ultimately virtual machine crashes and data loss.
The first step in dealing with storage oversubscription is to ensure suitable monitoring is in place. VCenter enables alarms to be defined against individual data stores to track when capacity exceeds a predefined value. These cover both the physical disk utilization and the oversubscription level. Using these alarms, the administrator can determine when action needs to be taken to increase disk capacity or review the level of oversubscription being performed.
The administrator needs to have a plan of action for when an alarm is generated. For data stores that can be extended, additional extents can be added, up to whatever predefined architectural level has been designed for or to the system maximum. If a data store reaches its full capacity, virtual machines will need to be moved to new data stores, which is called a service intervention.
When establishing the thresholds for alarms, take into consideration the impact of making data movements. If alarms are generated too early, with low thresholds, space is wasted. If alarms are generated too late, data migration becomes a reactive task rather than a planned one, which can result in a business impact. Wherever possible, actions taken to manage storage oversubscription should be planned and implemented outside of core business hours. Storage Distributed Resource Scheduler (SDRS) enables migration actions to be taken automatically or to be raised as recommendations for the administrator to implement at a suitable time.
This was first published in December 2012