The adoption of storage virtualization has been increasing as many of the early barriers to its implementation have fallen away. A wide selection of mature storage virtualization solutions are available should you decide to deploy the technology either at the array or in the network.
While there may be some dispute over an exact definition, storage virtualization is generally considered technology that provides a flexible, logical arrangement of data storage capacity to users while abstracting the physical location from them. It’s a software layer that intercepts I/O requests to the logical capacity and maps them to the correct physical locations.
In storage virtualization’s most basic implementation, which is at the host level, a logical volume manager permits storage capacity provisioning to applications and users. Block storage virtualization is more commonly implemented due to the complexity of LUN management and the conditions necessary for flexibility in storage provisioning, particularly in multi-user environments, although it is also implemented with file storage systems. This article discusses storage virtualization technologies at the network and storage device levels, not at the host level.
Groups, LUNs and partitioning a legacy process
The process of creating array groups, allocating LUNs and partitioning volumes is a complex and inefficient way to provision storage, especially when it comes to balancing performance and reliability of physical disks across drive shelves. Additionally, enlarging a host’s volume can be a lengthy procedure of copying data and concatenating LUNs. Storage virtualization delivers an improved way to keep pace with the requirements of provisioning storage to applications and servers while cutting down on the time and resources used up by permitting the “brains” of the storage system to gain most from the decisions. This technology can also enhance utilization by taking away the guesswork of physical allocation while leveraging technologies like thin provisioning.
At first, storage virtualization was a tool used solely to provision and manage storage efficiently. However, by segregating the host from physical storage, the technology also allowed storage capacity in another physical chassis (which could be from different manufacturers) to be merged logically into shared pools that are more easily managed. While some of these heterogeneous systems were used to generate larger volumes than were present on any particular disk array, the majority of use cases utilized storage virtualization as a common management platform. This permitted current storage systems to be repurposed and decreased the overhead linked to managing several silos of storage, although the physical disk systems still required maintenance.
Storage virtualization can enhance performance as host volumes are easily distributed across greater numbers of disk drives, which could negatively impact capacity utilization. Virtualization also enables storage tiering and data migrations between devices, including transporting older data to an archiving appliance or hot database indexes to a solid-state drive (SSD) cache. These actions are usually performed based on policies established at the host, application or file level, and the same data transferring mechanism can be used to move data offsite for disaster recovery (DR) reasons.
With traditional scale-up architecture, where the controllers are separate from the disk shelves, virtualization at the level of the storage device is usually built into the controller operating system. Chiefly, this standard feature allows for a workable solution for provisioning the tens or hundreds of terabytes that current storage arrays are able to hold. Most systems include the ability to create tiers of storage within one virtualized system or among discrete systems, using different storage types (performance drives, capacity drives or SSDs) and different RAID levels. Some also include a policy engine and the ability to move file or sub-file data blocks among the tiers based on activity, application and so on. Many systems enable data to be copied to a second chassis for high availability or moved to a second system at a remote site for DR. While most storage systems include virtualization, many don’t support storage from other vendors. For a heterogeneous virtualization solution that can consolidate different vendors’ storage systems, most choices are network-based.
Some years ago, conventional storage wisdom was that storage services, like virtualization, and in some ways storage control, would reside in “smart switches” on the storage-area network (SAN) sooner or later. Although at least one storage virtualization solution is going down that path, the network implementation of storage virtualization technology has typically been with appliances. These appliances are basically storage controllers that attach to disk arrays or storage systems from certified vendors, or they’re software that’s installed on user-supplied servers or virtual machines (VMs). Storage virtualization appliances attach to heterogeneous storage arrays directly, or via Fibre Channel (FC) or iSCSI SANs, but most also allow for the option of using their own disk capacity. Most solutions include some storage services, such as file sharing, snapshots, data deduplication, thin provisioning, replication, continuous data protection (CDP), etc.
In-band and out-of-band virtualization
In the beginning stages of the lifecycle of storage virtualization technology, two primary architectures emerged: in-band and out-of-band virtualization. In-band implementations set up a controller between users and physical storage or the SAN and delivered all storage requests and data through that controller. Out-of-band products set up a metadata controller on the network that remapped storage requests to physical locations, but didn’t deal with the actual data. That added intricacy to the process but cut down on the CPU load compared with in-band virtualization. Out-of-band storage virtualization also took away the likelihood of a disruption associated with decommissioning an in-band device, as users are disconnected from their data while storage is remapped. Currently, many network-based virtualization solutions use the in-band architecture, most likely because CPU power is relatively plentiful compared with when storage virtualization first emerged. Another reason for the popularity of in-band solutions is that they’re easier to implement, which means faster time to market and fewer problems.
Storage virtualization solutions
Virtualization has become a necessary function for storage provisioning and is incorporated in some fashion with many midsized and larger storage systems. While there are many variances among arrays and their virtualization technologies, most of these device-based implementations don’t support disk capacity from other vendors. Instead of listing the large number of these storage systems, we’ll home in on the smaller category of heterogeneous storage systems. Below are instances of heterogeneous storage virtualization as deployed in hardware and software products available from a variety of manufacturers.
DataCore Software Corp.’s SANsymphony is a network-based, in-band software solution that operates on commodity x86 servers. It supports heterogeneous storage devices using FC, Fibre Channel over Ethernet (FCoE) or iSCSI and attaches to hosts as FC or iSCSI storage. Multiple-node clusters can be created to scale capacity and provide high availability. The system offers remote replication and storage services such as synchronous mirroring, CDP, thin provisioning and tiered storage.
EMC Corp.’s Invista is an out-of-band software product that runs on two servers (called a Control Path Cluster, or CPC) and works with “intelligent switches” from Brocade or Cisco. It is able to virtualize storage from many major manufacturers, connecting to storage and host servers through Fibre Channel. Invista offers mirroring, replication and point-in-time clones between storage arrays.
FalconStor Software Inc.’s Network Storage Server (NSS), a network-based, in-band appliance, connects to heterogeneous storage systems through iSCSI, FC or InfiniBand and supports host connectivity via Fibre Channel or iSCSI. Expansion and high availability are offered by connecting multiple controller modules. Aside from WAN-optimized replication, NSS also offers synchronous mirroring, thin provisioning, snapshots and clones.
Hitachi Data Systems’ Universal Storage Platform V (USP V) is a tier 1 storage array system that also provides in-band heterogeneous connectivity to a majority of major storage vendors’ arrays. It incorporates the types of features and services typically characteristic a tier 1 solution, which include thin provisioning of internal and externally attached storage.
IBM’s SAN Volume Controller (SVC) is a network-based, in-band virtualization controller that sits on the SAN and connects to heterogeneous storage systems through iSCSI or FC. Pairs of SVC units allow for high availability, and up to eight nodes can be clustered to scale bandwidth and capacity. Every SVC module includes replication between storage systems and a mirroring capability between local or remote SVC units.
NetApp Inc.’s V-Series Open Storage Controller is an in-band virtualization solution that’s not unlike a NetApp filer controller, but configured to support heterogeneous storage arrays. It attaches to a FC SAN on the back end to consolidate as much storage as desired from existing LUNs and consolidates them into NetApp LUNs for block or file provisioning as would a conventional NetApp filer.
NetApp recently acquired the Engenio Storage Virtualization Manager (SVM), a network-based, in-band virtualization controller that maintains heterogeneous storage systems. Information about how NetApp will promote this solution has yet to be made public.
Handle with care
Since a majority of storage virtualization products are in-band, it’s important to take the time to understand the effective performance of the virtualization appliance or cluster as this will be the gating factor to capacity expansion. In addition, storage services or features will also consume CPU cycles, further reducing effective capacity.
Storage virtualization is a very effective tool to decrease Capex by enhancing capacity utilization or performance, but its most useful advantage may be on the Opex side. It can make storage management easier, even across platforms, and cut down administrative overhead. Virtualization can also make storage expansion a relatively straightforward process, usually performed without taking storage systems down or disrupting users.
Eric Slack is a senior analyst at Storage Switzerland.
This article was previously published in Storage magazine.
This was first published in December 2011