Editor's Note: The following is a vendor-neutral white paper about virtualization from StorageTek. This whitepaper...
has been in high-demand by our readers recently within our own searchStorage Sound Off forums. We hope you find it useful.
Virtualization: One of the major trends in the storage industry - What are you getting for your money?
By Robert F. Nieboer
The word "virtualization" has crept into the IT lexicon. What does it mean and how much of the claims are hype versus delivering real value to the business?
As storage growth continues to exceed 100% per year, and as heterogeneity proliferates, the complexity of managing IT infrastructures increases exponentially. The promise of virtualization is that it will significantly improve storage manageability. But unless it also delivers on cost containment for IT, virtualization is only delivering on a solution to part of the problem.
This paper will define virtualization, contrast the kinds of implementations that are being announced on an almost daily basis, and will provide a basis to compare and evaluate various offerings.
Storage growth, people, the economy, business and IT budgets
Industry analysts appear to be consistently predicting a 100% compound annual growth rate for storage. To put this into perspective, an organization with 1 terabyte of disk storage today, will have 32 terabytes 5 years from now. Many companies in the Global Fortune 1000 we speak to have much more than 1 terabyte of storage today, and are frankly very concerned about the prospect of having to deal with a storage infrastructure 32 times bigger than it is today. There is a clear trend towards a more pragmatic approach to storage acquisitions. A pragmatism born out of the fact that storage's share of the IT budget will quadruple over the next four years and this will cause profound changes in infrastructure, storage management strategies and operations staff? (quote from Eurostorage web site, 2001). The anecdotal evidence we are receiving on a daily basis suggests there is clearly less of an inclination to "throw more disk at the storage problem."
The first reaction of most CIOs to the prospect of storage infrastructure 32 times bigger than it is today i: How will I be able to manage it with the people I have today? Given the lack of significant improvements in storage management productivity, the shortfall of IT professionals worldwide, static or shrinking budgets, massive growth, the strategic nature of information in both competitive differentiation and in implementing e-business applications, and the need to simultaneously improve availability and scalability of storage infrastructure, it must seem impossible to CIOs to find a solution to all this chaos.
There is almost a nightmare element to the many variables that are coming together at one point in time. We could almost call this the Perfect (Storage) Storm. Ten years ago in 1991, in one of the rarest meteorological events of the century, three separate weather systems were on a "perfectly" aligned collision course. A Great Lakes storm system moving east, a Canadian cold front moving south, and Hurricane Grace moving northeast were all headed for the North Atlantic. Along the way, the storm would create monster seas, batter ships, and cause coastal flooding along the eastern U.S. seaboard. To IT professionals in 2001, this is the perfect storage storm.
It is not as though storage growth can be slowed down to match budgets, or the economy, or even to match the ability of human beings to deal with it more comfortably. Storage is not a faucet that can be turned off. Storage growth is driven by information flow, which in turn is driven by applications created to run the business and to maintain or improve competitive positioning. E-business applications - from supply-chain management to customer relationship management and everything in between - are vital new non-discretionary elements of the post-internet business world. Storage is a non-discretionary budget item that is now consuming more than 50% of server deployment costs.
So, what are the solutions to these problems? How do we survive the "perfect storage storm"?
New architectures and technologies
The two most discussed storage innovations of the last couple of years are storage networking and virtualization. While it is the goal of both to address challenges of manageability, affordability, availability and scalability, it is virtualization that we will discuss here in more detail.
Many vendors, large and small, are talking about storage virtualization. It is clearly positioned by the majority as a means to simplify management of large, complex, heterogeneous storage environments, with the clear implications that virtualization will exist within a storage networking (typically SAN) environment. Right now, most of these announcements appear to generate more questions than they answer. What do they mean? What is being virtualized? Where is it being implemented? Is this virtualization or abstraction? Is this simple pooling of devices with a fancy name? How much of what is being announced is available today? Which server operating systems are supported today? It is reasonable to challenge many of the claims being made, but meanwhile, there is a need to clarify the "virtual" landscape.
A perfect illustration of the confusion surrounding the meaning of virtualization is this story. At a recent technology conference, a lengthy panel discussion had representatives from many vendors stand up one after the other and describe their virtualization strategy and product (one vendor actually has two conflicting products). At the end of the last presentation, the discussion chairperson, himself the chief technologist at one of the vendors, asked the large audience of primarily end-users, "Is there anyone who now has a better understanding of virtualization and what it is?" Not one hand went up. After the laughter died down, it was clear that there is tremendous diversity in defining and implementing virtualization.
The Robert Frances Group defines virtual as "those architectures and products designed to emulate a physical device where the characteristics of the emulated device are mapped over another physical device." Another way to express this is to say that virtualization separates the presentation of storage to the server operating system from the actual physical devices. Neither of these statements implies an underlying architecture, yet, as stated earlier, most claims to storage virtualization today are made in the context of storage networks. This is no accident, since storage networking and storage virtualization are trying to address the same fundamental problem: storage manageability. In fact, it would be fair to say that, even as storage networking is only just beginning to be widely implemented, it is already recognized that we need to do something more to ease the storage management burden.
We believe that the scale of the problem is so large, that the goal of all the efforts surrounding storage and storage manageability should be to plan for the elimination of human intervention in storage management. Virtualization is a step in that direction and we believe that automated policy-based management algorithms and decision-making intelligence will join virtualization in the near future.
Virtualization implemented within the context of a SAN contributes several things to the goal of easing storage management workloads. It hides complexity by simplifying the server's view of what devices exist. It masks change by enabling physical storage devices to be removed, upgraded or changed without the need to tell the operating system via device drivers or otherwise that the storage world is different now. It can magnify an administrator's productivity by pooling large amounts of storage and allowing that storage to be allocated across many servers via a GUI or similar interface. It can aggregate small amounts of storage across multiple devices and make it appear as a single large disk. And it can reduce cost in at least a couple of ways: by allowing aggregations of commodity storage components to be presented as something else entirely and by eliminating the under-utilization of capacity.
It could be argued that some of these things aren't even virtualization, but abstraction or emulation, or aggregation. However, the point is not to argue semantics but to stimulate a critical view of virtualization offerings so that intelligent choices can be made.
This paper proposes some fundamental positions:
- The purpose of storage virtualization is to enable better management and consolidation of storage resources.
- Virtualization may be implemented at multiple points on the continuum between the application and the data.
- For simplicity's sake, those points are at the host, the network and the storage device.
- Each implementation point can deliver advantages that are unique to that point, and some things are done better at certain points.
The what and the where (and the span-of-control)- pros and cons
In attempting to understand and differentiate multiple implementations of storage virtualization, we can begin by defining the things that are being virtualized - the "what" - and the place this virtualization is being implemented - the "where" or the instantiation. Since the primary purpose of storage virtualization is to enable better management of storage resources, the "what" is typically tape and/or disk.
The vast majority of recent storage virtualization architectures announced by many vendors are designed to be implemented within the context of a storage network; therefore, the "where" is either the server, the network or the storage device.
There is another element of virtualization in addition to the "what" and the "where". This is called "span-of-control". For example, if virtualization software is implemented in the server, then logical or virtual storage presentation is implemented there, but it is mapped to storage that exists beyond the server. Therefore, span-of-control extends beyond the platform where the virtualization is implemented.
There is a degree of predictability in virtualization implementation depending upon the core competency of the vendor. For example, it is likely that a server vendor will implement storage virtualization at the server level. It is equally likely that a software vendor will implement virtualization on a server platform. Typically in these implementations, virtualization's storage - presentation services - is done in the server, and is mapped to external storage. There is no control over external storage devices other than allocation.
There is an opportunity within a server-centric virtualization approach to transparently exploit the multiple performance and cost characteristics of a multi-level storage hierarchy. In fact, the industry has flirted with this concept for years but it has often been rejected as too difficult and too people-intensive to implement. What if storage hierarchy virtualization was combined with policy services to mask the existence of a storage hierarchy from storage-intensive applications? This capability could also be implemented under a network-centric virtualization scheme.
Some questions to ask of vendors implementing virtualization in the server:
- Is software required on every server participating in the storage network?
- Does server I/O bandwidth impact virtualization effectiveness and I/O performance?
- Is there a maximum amount of storage supported in this storage network? If so, what is it?
- What kinds of storage devices are supported?
- Can they be any vendors? storage devices?
- What backup applications are supported?
- Is there any kind of policy-based management capability available or planned?
- Will this solution support serverless backup and/or migration?
- Is this compatible or interoperable with other virtualization methods?
Network vendors are not necessarily only going to implement virtualization in a network device, but it is likely. The definition of a network device for the purposes of this paper is a kind of hybrid storage domain manager or an intelligent router or an intelligent switch, and a platform that is capable of executing the storage virtualization. Presentation services are done at the network, and the logical devices are mapped to external storage devices. There is no control over external storage devices other than allocation.
In a number of ways, the network is the most logical place to implement storage virtualization. It is neither a server, nor a storage device, so in existing between these two environments, it may be the most "open" implementation of virtualization. It is the implementation of storage virtualization most likely to support any server, any operating system, any application, any storage device type and any storage vendor.
Maybe the most compelling reason to locate storage virtualization in the network is that then it would exist within the natural data path for all I/O activity.
Also, in "seeing" all the storage devices and device types, it is a practical foundation for policy-based management intelligence.
Questions to ask of vendors implementing virtualization in the network:
- What servers, operating systems and applications are supported at the server level?
- What kinds of storage devices are supported?
- Can they be any vendors? storage devices?
- What I/O bandwidth limitations are there?
- Is there a maximum amount of storage supported in this storage network? If so, what is it?
- Will this implementation support serverless backup and/or migration?
The third alternative for the "where" of storage virtualization is in the storage itself. This is an interesting implementation. If virtualization is done here, and the vendor is a storage vendor, then there are some challenges to avoid limiting the storage devices to just those supplied by the vendor. The storage vendor implementing storage virtualization might form a strategic alliance with a server vendor, a software vendor, or a network vendor to avoid creating a proprietary lock-in. But what makes this an interesting implementation is not the "what" necessarily, or the "where" at all, but the "span-of-control."
When storage virtualization is implemented at the device level, there is an opportunity to have both the logical (virtual) environment and the physical devices within a common "span-of-control". Exploitation of this span-of-control - meaning management control of both the logical presentation services and the physical resources needed to satisfy the storage demand - could lead to capacity and operational efficiencies unavailable to virtualization implementations where the physical storage devices are external to the virtualization engine?s span-of-control.
In fact, there are today two types of implementations of device-level virtualization where logical devices and physical devices exist within the span-of-control of the virtualization engine. These are virtual disk and virtual tape. For the purposes of this discussion, the benefits accruing to the fact that the span-of-control encompasses both the logical (virtual) devices and the physical devices, are very large efficiencies in capacity utilization in the case of the virtual disk, and very large efficiencies in tape media utilization in the case of virtual tape.
The continuing poor utilization of storage resources in enterprise-class disk environments today is matched by inefficient overhead caused by historical and new practices. Typically, only 80% of capacity is actually allocated to files and data bases. That leaves 20% of capacity never allocated and reserved for growth factors. An additional 20% to 30% is wasted by being allocated for files that never grow to fill that capacity. That means that between 40% and 50% of available disk capacity may never be utilized.
Point-in-time copies, used by many hardware and software vendors to minimize recovery times in the event of data loss can double the amount of capacity needed to satisfy application requirements. Application development challenges IT organizations to provide whole files and databases to test against, again consuming capacity. Time-to-market for new applications is also impacted by how often test files and databases can be reset after test failures.
This poor utilization of capacity and increasing amount of overhead are exactly the kind of infrastructure cost issues that can be addressed by virtualization implemented at the device level and leveraging a span-of-control that includes both logical devices and physical resources.
In tape today, virtualization is being introduced primarily to improve cartridge capacity utilization now that cartridges can cost $100. The unanticipated bigger benefits of tape virtualization have been application performance, and the ability to achieve 100% tape automation by leveraging existing libraries and drives to automate those cartridges that were still within a manual environment. This latter factor alone justifies the use of virtual tape in UNIX and NT.
Questions to ask vendors of virtualization solutions implemented within the storage device:
- Will this solution support other vendor storage devices?
- Are there I/O bandwidth limitations?
- Are there processor bandwidth limitations?
- What server platforms and operating systems are supported?
- What storage capacity is supported?
- Can multiple subsystems communicate with each other or share resources?
- Is capacity utilization maximized? Is capacity overhead eliminated?
- How is availability ensured?
Are there any storage management functions carried out within the engine? If so, what functions?
In this paper we have discussed three places where virtualization can be implemented - the server, the network or the device. We have also implied that most virtualization implementations are concerned with simplifying the management of large storage infrastructures shared across multiple heterogeneous servers and applications. Presenting a logical view of storage devices to servers that is disassociated from the physical reality can have the effect of hiding complexity, masking change and improving people productivity. What is missing from most virtualization implementations is the means to reduce the amount of storage infrastructure needed to satisfy application storage requirements. Yes, manageability is a serious issue that requires resolution. But so is infrastructure cost.
Users should carefully examine the claims of vendors about their storage virtualization products. What is available today? How open is it? Is policy-based management and intelligence present or planned? Is infrastructure cost reduction an objective?
Storage virtualization is necessary to overcome some of the interoperability limitations of storage networking, as well as in its own right for providing two huge and timely benefits to IT organizations:
1. Significantly improved storage manageability
2. Significantly reduced storage infrastructure cost
The first benefit is the objective of all of the virtualization architectures and implementations. However, the second is only obtained in two cases:
- if virtualization is implemented at the device level, and if the virtualization engine is designed to exploit the fact that both the logical devices and the physical devices exist within the same span-of-control and so optimize the use of physical capacity.
- and/or if virtualization is implemented in such a way as to transparently exploit the cost-benefits of a policy-driven multi-level storage hierarchy.
Storage virtualization is an important development. Right now, treat vendor claims with a degree of skepticism until functionality and roadmaps are understood.
About the author: Rob Nieboer is a corporate evangelist for StorageTek, and is currently responsible for global industry analyst relations for the company. Rob's background includes some 34 years as an IT practitioner, with the last 17 years focused on storage. His career with StorageTek has included responsibilities for systems engineering, systems engineering management, worldwide tape and library marketing, regional marketing, and strategy.
Rob is a frequent speaker at StorageTek and industry events around the world. He has a particular interest in management issues surrounding storage, and in storage virtualization.
What did you think of this white paper? Let us know at firstname.lastname@example.org.