When it comes to assessing and monitoring how well the storage systems that support your server virtualization environment are doing via storage performance metrics, there are many layers to consider.
Storage for virtual servers can be served up by the hypervisor or served up from block- or file-based storage devices over a network connection. When storage is served up by the hypervisor, the hypervisor controls the access and as such is the more interesting of the two since what impacts the hypervisor impacts everything running on the hypervisor; metrics—as well as understanding how they impact the overall workloads on the hypervisor— matter.
Storage served up by the hypervisor looks just like a SCSI device to a virtual machine, while network-served storage may require specialized drivers, such as iSCSI.
Hypervisor-served storage can be in the form of Fibre Channel, iSCSI, NFS (and, in the case of Hyper-V, CIFS) or local storage, but by the time the virtual machine attaches to the storage device, it acts just like a normal SCSI device and therefore uses a normal SCSI driver from within the guest operating system. The hypervisor uses binary translation of the standard virtual machine SCSI driver commands into those that can be handled by the other technologies, whether that is Fibre Channel, iSCSI, NFS or a local SCSI device. Binary translation happens either within the hypervisor or within the CPU using Intel VT-x or AMD RVI command structures. In either case, the virtual machine sees the storage as SCSI, while the hypervisor sees the storage as something else entirely.
Metrics and how to interpret them
Given these dynamics, there are multiple sets of metrics to consider related to virtualization storage:
- Storage performance metrics seen by the guest operating system
- Storage performance metrics seen by the hypervisor
- Storage performance metrics seen by the storage hardware
Each set of metrics is important for specific reasons, but you can’t count on them to always guide you to the right decision. Some of these metrics could tell an untruth and therefore by untrustworthy.
The least valuable metric is the information seen by the guest operating system, since the virtual machine does not necessarily receive full CPU cycles, in which case the data within the virtual machine is suspect. The metrics involving CPU cycles for virtual machines are always suspect because the virtual machine may or may not receive full CPU cycles. But, some virtual machine metrics don’t have anything to do with CPU cycles, and those non-CPU VM metrics are trustworthy.
Storage performance metrics seen by the hypervisor are valuable, and the most frequently viewed, but these can also be misleading as they’re based on data that could be cached or queued by the hypervisor.
That leaves metrics as seen by the storage hardware, and they’re the best ones to use, since the hardware layer provides fine-grained data, down to the spindle being used. In many cases, this data is the same as seen by the hypervisor, but in highly latent subsystems, this may not be the case. Unfortunately, not all hardware subsystems allow this data to be seen, in which case you need to concentrate on the metrics available to the hypervisor since metrics from within the VM are not completely reliable.
From the virtualization storage hardware layer, the most important metrics are the read and write latency values, or how long the data took to be read from the disk or written to the disk once received by the specific layer. Next up in importance is the number of IOPS. You cannot only look at IOPS without understanding the read and write kilobits per second, or Kbps. IOPS refers to the operations; Kbps refers to the actual data read or written by the system. IOPS is the most-looked-at metric whether from the hypervisor or storage device. However, latency is a better metric since it indicates whether there are issues with the storage. The IOPS number varies with the number of blocks to be written and, with NFS (and CIFS), can’t be easily tracked since latency metrics are not inherent within the protocol.
Which tools can do the job?
To gather all these storage performance metrics, look to tools from companies such as NetApp (Balance), SolarWinds (Storage Manager), Quest (vFoglight Storage) and others that talk directly to the hardware. Such products can examine the storage hardware layer using a Storage Management Initiative Specification (SMI-S) software layer or directly through storage manufacturer protocols.
For the hypervisor level, tools such as VMware vCenter Operations, vKernel, VMturbo and Quest vFoglight, to name a few, inspect the hypervisor layer by querying the hypervisor directly or indirectly using the hypervisor’s central management console (such as VMware vCenter).
Finally, guest operating systems provide their own tools for gathering storage performance metrics.
A combination of SMI-S and hypervisor-based tools provide the best mix of functionality for determining the latency, IOPS and bytes written or read. That’s mainly because the numbers produced by these tools tend to be in sync unless the hypervisor is extremely busy. In that case, the hardware numbers are best for pure storage performance, but since all resources touch one another within the hypervisor, ensuring that the hypervisor does not reach that extreme busy state is always a good thing.
If the storage-level tools do not exist for your environment, such as with some iSCSI servers, the hypervisor-based tools become the most important.
This was first published in September 2011