Many IT shops move to the latest version of VMware Inc.'s vSphere virtualization software as part of their natural upgrade cycles -- not for any particular features or functions. But some are taking special note of the new or enhanced storage capabilities in vSphere 5 that were a central focus of the 2011 release.
Three of the most prominent new vSphere 5 storage twists are the Storage Distributed Resource Scheduler (DRS), the updated Virtual Machine File System (VMFS) and Profile-Driven Storage. Several customers using these vSphere 5 storage features said they find them valuable, but they also note reasons to approach them with caution.
Storage DRS lets users combine the storage resources of several storage volumes into a single pool and balance the load between the volumes to ensure they don't run out of space or suffer I/O bottlenecks. The new feature takes its cue from the server-side DRS that VMware already offered to automatically move virtual machines (VMs) to load balance CPU and memory.
"I find the capacity balancing to work really well," said Ed Czerwin, a senior systems engineer and virtualization lead at a large medical device manufacturer based in Switzerland. "Before, I had to tell the operations staff, 'Hey, put this VM on this LUN.' Now I can just say, 'Put it to this cluster.' It will balance everything for them, and I don't have to chase after space so much."
Czerwin said he can assign a disk spindle type to a Storage DRS cluster and specify which clusters store the operating system, log and database virtual machine disk (VMDK) files. The company virtualizes more than 95% of its servers, including those hosting the SAP ERP system and Exchange Server, and uses EMC Corp.'s Symmetrix and VNX for its main storage tiers, he said.
Storage DRS also caught the eye of Bob Plankers, a virtualization architect at a major Midwestern university. Plankers wanted to thin provision storage for the school's VMs so the IT department could bill for actual as opposed to allocated storage. He said Storage DRS helps avoid out-of-space scenarios by kicking off Storage vMotion, which can migrate VMDK files across storage arrays without downtime.
Plankers said he creates standard-sized 2 TB VMFS data stores, puts them in a Storage DRS cluster and sets the configuration to prevent the storage from dropping to lower than 200 GB of available capacity.
"That's solving immediate problems," he said.
But Plankers views Storage DRS as "essentially a 1.0 product" with user interface problems and bugs. He cited the example of his inability to switch a VM's disk format to thin provision while using Storage vMotion to move the VM to a specific data store cluster.
"There's a workaround. You basically shut Storage DRS off temporarily for that VM, migrate it into one of the cluster data stores and then go back later, after it's done, and turn the Storage DRS back on for that VM," Plankers said. "Not a big deal, but kind of a pain if you've got 600 VMs to do it with."
Cormac Hogan, a senior technical marketing architect at VMware, acknowledged the process is manual with additional steps, but he said the vStorage APIs for Array Integration (VAAI) SCSI Unmap primitive alleviates the issue. However, not all storage arrays can process SCSI Unmap commands.
Several customers using the DRS, VMFS and Profile-Driven Storage features in vSphere 5 said they find them valuable, but they also note reasons to approach them with caution.
Chris Hansen, a systems manager at Gordon College in Wenham, Mass., is hesitant to use Storage DRS for another reason. Hansen said he is concerned that Storage DRS will interfere with the automated tiering function of the school's Dell Compellent arrays.
"I think they would just end up fighting each other," Hansen said.
In a blog post, Frank Denneman, a senior architect on VMware's technical marketing team, claims that Storage DRS can be used with array-based auto tiering for the initial placement and out-of-space avoidance features, but he does not recommend enabling the I/O metric feature, popularly known as I/O load balancing.
For Hansen, the main drivers to upgrade to vSphere 5 were the VMFS-5 enhancements that give users the ability to create a VMFS data store/volume of up to 64 TB on a single extent (which is the partition on a storage device or LUN where the VMFS volume resides) and the chance to use a unified 1 MB file block size for VMDK files.
Mike Adams, group manager of vSphere product marketing at VMware, said customers had been asking for the data store change for quite some time. Under VMFS-3, they could create a data store of up to 64 TB, but they had to concatenate a bunch of extents to do so. The maximum size of a VMFS-3 extent was 2 TB.
Hansen said he found the data store limitation to be a "bit of a maintenance nightmare" with especially large VMDK data files and servers that required a large amount of capacity. For instance, Gordon College allocated eight data stores to one end-user backup server.
Gordon's IT staff sometimes coped by doing a raw device mapping (RDM) to connect the virtual machine directly to the SAN, a practice they continue to follow with servers that have VMDK files in excess of 2 TB, which remains the VMDK size limit, Hansen said. The main disadvantage of the RDM approach is that the staff needs to use SAN replication and take the system down to move a server from one data center to another rather than simply using VMware's Storage vMotion to do the migration without interruption.
The college had an unfortunate experience with thin-provisioned VMDK files. Two files that started out at about 100 GB eventually grew to about 1 TB each in size, filling a data store and crashing the application server. Hansen said he used Storage vMotion to move one of the large VMDK files to another data store, but the college suffered one-and-a-half days of server downtime because the server couldn't power back on until there was adequate space in the data store.
"Luckily for us in that case, it was a redundant backup server. It was a copy of a copy, so users weren't affected," said Hansen. "If we had that same problem today, it would just be a matter of expanding that data store, and you could be online again in minutes."
The other feature that Hansen finds beneficial is the chance to get to a uniform 1 MB file block size. Prior to VMFS-5, users had to choose a file block size of 1 MB, 2 MB, 4 MB or 8 MB based on the size of the virtual machine disk file. A VMDK file with a maximum size of 256 GB required a file block size of 1 MB; a 512 GB file needed a block size of 2 MB; a 1 TB file called for a 4 MB block size; and a 2 TB VMDK file meant an 8 MB file block size.
"You had to make a decision: How large should my VMDK file be?" Hansen said.
Czerwin said his company's system took a performance hit when using Storage vMotion to move a VMDK file of one block size to another. He also noticed that snapshots failed across two data stores when their VMDK files had two different block sizes.
"The only way to fix it was to Storage vMotion the LUNs that were running in the different block size data store to one that matched," Czerwin said.
IT shops have a choice of upgrading from VMFS-3 to VMFS-5 without changing the file block size, or they can create brand new VMFS-5 volumes and use Storage vMotion to migrate VMDK files from the old VMFS-3 volumes to the new ones.
"The recommendation that we're making to customers is that if they're in a position to do so, build a brand new VMFS-5 that has the 1 meg file block size and then migrate over VMs from the previous VMFS-3 to the new VMFS-5," said VMware's Hogan. "Then they get all the benefits of the unified block size itself."
But, shifting VMDK files from 2 MB, 4 MB or 8 MB block size to 1 MB can be a time-consuming endeavor, depending on the environment and the size of the files. For instance, Gordon College upgraded from VMFS-3 to VMFS-5 in a day, but the staff has been migrating files to the 1 MB block size for months. The school had only a handful of servers that already used the 1 MB file block size.
"It is a long-term project to be able to get everything migrated over," Hansen said. "You set a Storage vMotion once, you let it run for a couple of days and it finishes. Then you run another one. It's not like you have to sit there and watch it the whole time, but it's something to keep track of."
Gordon also has a requirement to keep all SAN block-level replicated data for 30 days, so when the IT staffers migrate from the old data store to the new data store, they can't delete the old data store for 30 days.
"Ignoring the fact that we have to wait 30 days, even just moving a 2 TB volume alone takes a couple days, not to mention the performance is degraded on those servers when it's happening," said Hansen.
But, Hansen said the end result is worth it because the school consumes less storage per data store by using the lower block size. Plus, VMFS-5 "certainly simplifies things, not having to worry about which data stores are at which block size," he said.
Among the other new vSphere 5 features that some customers find helpful is Profile-Driven Storage. The storage or virtualization administrator can map a high, medium or low performance service level to the back-end storage resources, according to VMware's Adams.
Adams said VMware's vStorage APIs for Storage Awareness (VASA) take the feature one step further by providing a mechanism for "being smarter about how we create [storage] profiles." The array can feed information to vSphere to assist with decision making on which storage to use.
"It was definitely the case in the past when somebody created a virtual machine, they might use the wrong storage," said Adams. "Or, they might not get the resource that they need in order to meet a certain service level or service-level agreement."
Plankers views the Profile-Driven Storage/VASA tandem as a 1.0 release at this point. He said few vendors have implemented VASA on their arrays to enable the storage system to report the necessary information to enable administrators to make decisions.
"The idea is that you can run a report and see what isn't where it should be, for example, but you can't automatically do anything about it right now," said Plankers. He hopes the next version will facilitate a problem fix.
This was first published in August 2012