Unlike server virtualization, whose basic concept is defined as running multiple OS instances on a single physical server, storage virtualization has many implementations. Some storage virtualization products virtualize across different storage manufacturers' products. Some vendors virtualize within their own storage hardware. Some vendors are adding virtualized storage features, such as thin provisioning, to their non-virtualized offerings.
No wonder even storage pro are confused about
So what is storage virtualization? A virtualized storage system is one that will write its data to all (or a high number) of the disks of a certain class. If you have a virtualized array that has several shelves of 147 GB, 10,000 rpm drives and you create a LUN with RAID 5 protection, the virtualized system will store that data, including parity, across all the drives. What can be disconcerting to storage veterans is that when their storage is virtualized, they don't know exactly which disks their data is on. They give up the simplicity of being able to point to five drives and say, "There's my Oracle database."
Virtualizing storage this way -- within the storage array itself -- offers several advantages. Most importantly, provisioning is streamlined: The time required to create, assign and start writing data from an application to a new volume is greatly reduced. You simply tell the virtualized storage system the size of the volume and, in most cases, what type of application will be accessing the volume. The system then decides what the appropriate drive class and write stripe will be.
While these settings can be overridden, in most cases virtualized storage systems are highly efficient. Compare this to traditional storage, where the storage administrator must decide which drives in the array are the best candidates based on the current I/O load and remaining capacity. A virtualized storage system brings rapid deployment of volumes without pre-planning and load-balancing concerns.
With the virtualized storage foundation laid down and provisioning times addressed, other capabilities become available, such as thin provisioning. Thin provisioning allows a storage administrator to allocate large logical capacity to application servers attached to the virtualized storage but only physically use the actual capacity as data is written. This means you could allocate 500 GB to your Microsoft Exchange environment but only purchase and be using 80 GB of actual disk space.
Projected across an entire storage environment, 15 TB might actually only require the purchase, and more importantly the powering of, 5 TB. Compared to a standard array, which would actually require the purchase of the entire 15 TB, this is a net savings of 10TB. This ability to overprovision storage not only saves upfront costs but also the costs associated with powering and cooling that storage capacity.
How storage virtualization solutions differ
Storage virtualization solutions differ primarily in where the storage virtualization software resides. It can live on a separate appliance, be embedded in the storage fabric switch or be part of the storage controllers provided by the storage manufacturer.
The earliest storage virtualization products essentially offered differing software intelligence to the disk arrays installed at a site. The idea was that if you had hardware from two vendors, there'd be value in using virtualization software to manage the array and perform common software functions such as snapshots, replication and provisioning. Most of these storage virtualization solutions were appliances based on x86 processors. They were pure in-band systems where all the storage traffic routed through these appliances and out of band solutions required a server agent and only controlled information that went through the units. While the debate between in-band and out-of-band cannot be settled in this article, the argument boils down to an agentless in-band solution with possible performance bottlenecks vs. an out-of-band solution with agents but less constrained on performance.
The adoption rate of these early storage virtualization systems was relatively low, because of concerns over mixing and matching products from storage manufacturers. Although the technology worked, the comfort in doing so was not very high. Despite the low adoption rate, some early entrants in the storage virtualization market, including DataCore Software Corp., FalconStor Software and MonoSphere, have survived. In some cases their solutions have evolved into SAN management, replication or migration utilities.
Early on, storage virtualization was pitched as a way to save money. But for many potential customers, the perceived risk was not worth the cost savings. These early solutions also required double provisioning. You had to provision the storage array and then provision at the virtualization appliance. The certification matrix is further complicated by a device that must support an ever-growing number of storage arrays and controllers.
Certainly there is more customer comfort when the storage virtualization vendor is a storage giant, as with IBM and its SAN Volume Controller (SVC) or Hitachi Data Systems and its TagmaStore array. The suppliers are also more conservative in adding support for other manufacturers' hardware. These systems tend to run on much more robust hardware platforms then did the early entrants in the market and they typically leverage a code base that has already been running in their traditional storage offerings.
These systems are now being implemented into the storage fabric, basically running in the SAN switch itself. Cisco and Brocade can, through the use of intelligent switches, run storage applications on the switch itself. Other companies investing in this approach include EMC with its InVista product or Incipent Inc. with its storage virtualization solution, Incipient Network Storage Platform (iNSP). If you are going to virtualize across multiple vendors' products, doing so in the fabric seems to be a logical place to perform the task. There is still some debate whether there should be external control servers or if everything should be embedded in the switch. There is also still some some question about protection of the metadata (the database that stores which physical LUNS map to which virtual LUNS), because if this metadata is lost, so are all the LUN mappings.Many of the early storage virtualization solutions were presented as a way to save money.
Switch-embedded virtualization technology is particularly appealing to data centers with several SAN storage arrays. However, there must be standardization on one SAN infrastructure, which could be challenging for some data center managers. Also, not all switch-level virtualization solutions are available across all storage switch infrastructures, so the data center manager must decide between accepting a storage virtualization solution that is compatible with the current infrastructure or replacing the infrastructure with one that is compatible with the virtualization solution.
These solutions fill a need in the data center for data migration and data replication, as evidenced by Incipent's introduction in June 2008 of its Automated Data Migration software, which is designed to allow enterprise customers with legacy SANs to migrate their data to new virtualized data centers served by Incipient's complementary iNSP storage virtualization solution. However, broad acceptance in the data center continues to be a challenge.
The most popular implementation of storage virtualization to date is that provided by a storage subsystem vendor such as HP (with its EVA family of arrays), Compellent Technologies and 3PAR. These manufacturers brought virtualization to market as one capability to their hardware offering. While there may be a cost savings compared to the traditional storage systems they compete with, the focus was on implementing a rock solid virtualization software foundation and leveraging its benefits, like rapid provisioning and thin provisioning. Compellent and 3PAR leveraged that technology to extend their capabilities: 3PAR with its multi-node clustering feature. Compellent and 3PAR both offer advanced data placement based on age and or type.
These strategies have seemed to pay off as both companies went public last year. The concept is easier for customers to understand: Buy a single solution from a single supplier and consolidate to it. While the concept of consolidating to a single storage interface from a single manufacturer is not new, storage virtualization makes it more achievable because of its ability to simplify the interaction with a larger single system.
What makes the most sense for your data center? It depends. While abstracting the storage software from the storage hardware may sound like nirvana, this approach supports many different moving pieces and could lead to confusion. If given the right storage platform you could see the possibility of consolidating to a single platform or even to a few storage platforms, then the storage hardware-based virtualization solutions may be the way to go. If a mixed storage platform of more than three or four vendors is forever in your future, then virtualization at the switch level may be a solution to consider for the data center. To net it out, virtualization at the storage hardware level seems to make the most sense in all but the very largest data centers. In those large data centers virtualization on the switch should be considered but still compared to storage hardware virtualization.
About the author: George Crump has had 20 years of experience designing storage solutions for IT decision makers across the U.S. He has held executive positions at Palindrome, Legato Systems Inc. and SANZ. He now heads up his own independent consultancy known as Storage Switzerland, which provides unbiased advice and strategy services to help storage professionals solve their storage management challenges.
This was first published in June 2008