Ensuring that storage volumes are properly aligned has always been an important part of optimizing server performance. But alignment has become even more important in recent years because of the widespread use of server virtualization
Sector alignment and deduplication
Before I discuss virtualization, I want to take a moment and talk about the impact that sector alignment has on the deduplication process in a physical data center.
Sector alignment is based on the idea of matching storage blocks to physical disk sectors. Most newer physical hard disks use a sector size of 4 KB. Likewise, file systems such as NTFS use a storage block size of 4 KB. The problem is that not all operating systems align the storage blocks with the underlying sectors. Unless the storage blocks are properly aligned, each could span two disk sectors. This can be problematic for the deduplication process.
There are countless deduplication products on the market, and they use a variety of techniques for storage deduplication. One of the most common deduplication methods, however, involves the removal of redundant storage blocks.
It is important to understand that misalignment does not change the contents of the individual storage blocks. A storage block will contain the same data regardless of whether the file system is aligned to the physical storage or not. As such, block-level deduplication functions the same way on a misaligned volume as it would on a volume that is properly aligned, at least from the perspective of eliminating redundant blocks.
The main problem with deduplicating a misaligned volume is that the deduplication process will be much more I/O intensive than it needs to be. When a volume is misaligned, each storage block spans two physical sectors, and each time a block is read (or written), the underlying hardware must read twice as many sectors than would be required if the volume were properly aligned. The result is a big performance hit, but the performance hit can be compounded if the disk is fragmented and storage blocks span sectors that are not located next to one another. Deduplication is already an I/O-intensive process, but misaligned partitions make it even more I/O intensive.
Virtualization and deduplication
All of the concepts I have just discussed hold true in virtual data centers, but there is an extra layer of abstraction to consider because of the use of virtual hard disks. If not properly planned, the way virtual hard disks are used can adversely affect your deduplication efforts.
Cluster Shared Volumes
Although they are no longer an absolute requirement, Hyper-V and VMware clusters have traditionally been based around the use of shared storage. Typically this means placing virtual machine components (including virtual hard disks) onto a storage area network (SAN), where they can be accessed by all of the nodes within the virtualization cluster.
If you use SAN-based shared storage, then it is important to verify that the shared volume is aligned to the underlying storage hardware. You should also verify that your deduplication method is compatible with shared storage.
To give you a more concrete example of why it is so important to verify compatibility, consider the way shared storage works in a Hyper-V cluster. Cluster nodes access shared storage through a logical mapping (C:\clusterstorage\volume). The problem is that some deduplication software will not deduplicate a server's system volume. Even though Hyper-V shared storage doesn't actually reside on the system volume, the logical mapping makes it appear as though it does. As a result, some deduplication software (including Microsoft's own native file system deduplication) is not compatible with shared storage.
If the deduplication of shared storage is a problem in your environment, then you may be able to get around the problem by using hardware deduplication (assuming your SAN offers this feature). Doing so will bypass operating system limitations.
Virtual hard disk structure
Another issue that can affect the deduplication process is that virtual hard disks have a structure that mimics that of physical hard disks. In other words, virtual hard disks are divided into sectors and tracks, and the virtual machine's file system is based on the use of storage blocks.
This is important because virtual hard disks are really files that reside on a physical server. With that in mind, imagine that you have two otherwise identical virtual hard disks, but one is aligned and one is not. The differences in alignment would mean that the two virtual hard disk files would be structurally different from one another, even though the two virtual hard disks contain the same data.
If your goal is to perform host-level deduplication, this structural difference could potentially result in less data being deduplicated. Whether the misalignment within a virtual hard disk proves to be problematic to the deduplication process depends on whether your deduplication software blindly deduplicates host-level storage blocks or if it is smart enough to see into the virtual hard disk.
There are plenty of issues to consider when preparing to deduplicate a virtualization host. As a best practice, make sure that your physical and virtual disks are properly aligned. Also, avoid the use of thinly provisioned virtual hard disks whenever possible since the provisioning process can affect performance and has been known to affect deduplication in some cases.
This was first published in January 2013