Users don't need to know the IP address, such as 188.8.131.52, of their favorite website. They simply go to Google, type in the name and get the right page, thanks to the Domain Name System (DNS). File virtualization works a lot like a DNS server. Users don't need to concern themselves with the physical location of the files they need; they simply request them from their home directories. A logical presentation layer, or global namespace, shields them from the arcane details of the paths to the storage devices or file servers.
That global namespace eliminates the need for discrete drive mappings and mount points to connect the clients with the file servers and storage devices. Instead, requests go to the virtual representation of those file systems -- the global namespace -- which routes the calls to the proper server.
If namespaces were the only truly benecial features in file virtualization products, storage managers might simply opt for the built-in DFS technologies that ship with Microsoft Windows Server. DFS allow administrators to group shared folders on different servers into one or more logically structured namespaces and synchronize folders between the servers. Administrators of Unix- or Linux-based environments can use automounters to access files through the NFS protocol.
But products that focus on file virtualization offer more granular capabilities and user-friendly tools to help IT managers dealing with NAS sprawl.
Moving file in the background
File virtualization can also help a storage administrator who simply wants an easier way to replicate data, do load balancing or migrate files from one server or NAS box to another without having to take the system down. The admin "can bring in a new file server and start moving files from server A to server B in the background with nobody knowing about it and without having to update all the different user mappings," said George Crump, founder of analyst firm Storage Switzerland. "All that is done automatically by the file virtualization appliance."
Another scenario where file virtualization products are helpful is managing policies to utilize storage more efficiently and cost-effectively. For example, an administrator might want to shift files that haven't been used for 60 days to cheaper storage devices or different servers.
"File virtualization is about creating that logical abstraction layer between the client and the file and where it's stored, but that's just step one," said Scott Shimomura, a product marketing manager at Brocade Communications. "The real reason that IT wants to virtualize the files is to simplify management of files and file resources."
Products that can help users reap the meatier benefits of file virtualization include the Windows-focused StorageX software add-on from Brocade and appliances such as Rainfinity from EMC, the Acopia ARX Series from F5 Networks, Maestro File Manager from Attune Systems and Brocade's File Management Engine. But before selecting a file virtualization product, storage managers need to weigh the built-in operating system services versus the software-only and appliance-based options, as well as whether the product employs an in-band or out-of-band architectural approach.
Microsoft noted that its DFS namespaces and replication technologies come at no additional cost with Windows Server 2008. Third-party appliances can range from $15,000 to $180,000 apiece. But Kevin Yam, technical product manager in Microsoft's Windows storage solutions division, acknowledged that DFS may require additional hardware to meet performance and scalability requirements in large deployments and does not offer the file-level granularity that some appliances do.
Acopia's appliances operate in-band, or in the data stream, sitting between the users/applications and the unified storage pool of file servers and NAS boxes. The in-band approach allows administrators to manage data at the file level as well as manage data in real time. For instance, if one file server has 10% capacity left and another has 90%, load balancing can take place in real time rather than waiting for the server to run out of space.
"We decided upfront that we were going to be in-band because we felt that's where a lot of the business value was," said Nigel Burmeister, Acopia's director of product marketing. "The challenge is it's much more difficult to build. The actual implementation of this approach is a purpose-built switch. It's not a PC off the shelf. That would have a tough time scaling to multiple custom-built NAS heads."
One criticism of the in-band appliance approach is its potential to create a single point of failure or bottleneck directly between the clients who need to access the files, and the servers and devices that store the files on the back end. Crump, however, dismissed that criticism as "a bit weak," saying, "Most of these systems are sold in highly redundant scenarios where there's at least two boxes at a minimum and, many times, those boxes are also replicated."
"Can you put so much data into the box that the box actually ends up being the hindrance to performance?" Crump asked. "What we've seen in actuality in our testing is that it's virtually impossible to plug these boxes. If you right-size the box for your environment, it's very unlikely that a group of Windows users could saturate one of these boxes because the receiving client is relatively slow in and of itself."
But Eric Kaplan, director of marketing for EMC's Rainfinity appliance, said that many of the company's customers have more than 50,000 users and hundreds of terabytes of data. He maintained that Rainfinity's hybrid approach -- in-band when necessary and out-of-band the rest of the time -- is more scalable.
"The benefit of our approach is that at all times clients are still mounted to the original file servers, so there's no disruptive mount and remount to put Rainfinity in place," Kaplan said. "When we do have to move data around, we're not in front of all of your file servers. We are basically getting in front of, or in-band, for only the segment of data that we're moving, and all of your native file server functionality still works as advertised by the vendor."
In out-of-band mode, client machines access a reference server that tells them where the data they want is stored, cache that information and then connect directly to the file server or NAS device to access the data. But the out-of-band approach has two main disadvantages: Administrators can't do live data migrations, and some products may require the installation of software agents.
About the author: Carol Sliwa is a veteran IT journalist.
Go back to the beginning of the File Virtualization Special Report.
Dig Deeper on File Virtualization or NAS Virtualization