Using iSCSI for your virtual server environment has several big advantages. The implementation of iSCSI storage devices is straightforward, and users get the benefits of a file system without the hassle of setting up Fibre Channel (FC). But iSCSI users can run into IP-related problems and face other issues, especially if they choose to use software-based iSCSI stacks. And many people think of iSCSI as the underdog to Fibre Channel. But it can work well in many virtual server environments—users simply need to be aware of the ins and outs of iSCSI, how it works, and what it can do for their environments.
In this podcast interview, Mike Laverick, a VMware expert, discusses the pros and cons of using iSCSI for virtual server environments. Find out the proper steps you should take when implementing iSCSI in your virtual environment, how to configure iSCSI in vSphere and how to test the performance of an iSCSI storage device.
Read the transcript below or listen to the iSCSI for virtual server environments podcast.
What are the advantages of using iSCSI for virtual server environments?
Mike Laverick: I think that the big advantage is that theoretically it should be much simpler in comparison to say, Fibre Channel [because] there are no WWNs, zoning, [or] special Fibre Channel switches that need to be acquired. There’s no complicated masking that needs to be done to present the storage, and what it means for people using virtual server environments is that they will get all the benefits of the vendors’ file systems [such as VMware’s VMFS] without the hassles associated with setting up Fibre [Channel].
I think the other big advantage is because iSCSI works both at the hypervisor level and in the guest operating system, the scope for using an iSCSI initiator actually inside a virtual machine [VM], where you might actually you get find better performance for raw I/O to the LUN, but more importantly you might be able to break through some of the limits that are currently around VMware’s hypervisor, where the maximum sized RTM or the maximum virtual size disk is still just 2 TB. If you, for example, run the iSCSI initiator inside a Windows VM, then the rules that govern the size of the partition are Microsoft’s GPT tables, which means you can go way beyond the 2 TB limit that a lot of people are restricted by.
The last big advantage [of iSCSI] is people tend to run their hypervisors in clusters, and they tend to keep one cluster separate from another for security reasons and for simplicity, but there are often files that you want to share between clusters such as … templates. And with iSCSI just being IP, it’s much easier to selectively present ancillary LUNs that maybe hold templates to your whole range of clusters without necessarily being duplicated.
What are the disadvantages of using iSCSI for virtual server environments?
Laverick: I think we have to remember that it’s still IP; it’s still Ethernet-based; it’s still TCP packages that are going across the wire. So in my experience the problems you tend to have are IP-related. It’s something that shouldn’t happen, but does—things like IP conflicts to bad addressing. People put the wrong subnet mask in, or they are going across a router … and they have the wrong default gateway entering.
I think [another disadvantage] is that most customers use the software-based iSCSI stacks that exist either in VMware ESX, Citrix Xen or in Hyper-V. The problem with that is it’s in software, so if you end up wiping that server and rebuilding it, you have to put all that configuration back again. In fairness, what customers could do is use an iSCSI HBA [host bus adapter] from the likes of Cisco, but they aren’t cheap. And so, very often people are using the software-based initiator, and that can provide some hassles when you come to rebuilding an environment. Very often people look down on doing upgrades and say, “Let’s just wipe the system and rebuild.” But when you do that you have to put all your storage configuration back into the hypervisor to make it work, whereas when you’re dealing with something like Fibre Channel, the Fibre Channel environment kind of exists externally from the stack of the hypervisor, so you can yank the system, put a new version in, and the LUNs will still appear.
I often get questions from customers asking me whether they should use Fibre Channel, iSCSI or NFS, and I often sit there and list all the advantages and disadvantages of all three, but I often feel like there isn’t one that has a clear lead in every single case.
The final [disadvantage of iSCSI] I would say is you can get some weirdness going on with iSCSI when you do re-scans. For example, when you take a LUN away from a server and do a re-scan, I’ve often found the LUN still there despite the fact that the server shouldn’t have any rights or privileges to it, and it’s because of the way that TCP sessions are kept open to improve performance and are shared between multiple LUNs. It’s actually quite a difficult thing to cleanly de-present a LUN. So you can have oddness with the LUN where you know the server doesn’t have access to it, but it thinks it does, and it eventually gets cleared about an hour or two hours later as the TCP session is brought down or re-established.
What steps do you have to take to implement iSCSI in a virtual environment?
Laverick: I think the implementation of iSCSI should be relatively straightforward because you should already have the pieces in place with the Ethernet network to actually allow the connectivity to take place. But to get into a bit more detail, I think most customers want to VLAN off their iSCSI [communications] into a separate VLAN or even physical switch if they can afford that. People have to remember that the iSCSI TCP port 3260, no iSCSI communication is encrypted. There’s no security around iSCSI from that kind of lifting data off the wire. So you have to be pretty careful about making sure that nobody can tap into that side of the network and literally just steal a packet capture of your data.
The other nice-to-have is 10 GB throughout from the hypervisor up to the storage. Yes, you can get away with 1 GB interfaces, and of course you can do multipathing I/O with bundled 1 GB interfaces. But there’s a level of complexity there which is perhaps a little bit undesirable so for a customer who is coming in new to iSCSI … it’s an ideal time to say, Is it time to move from 1 GB to 10 GB rather than doing that midway through the lifetime of that commitment?
I guess the last thing about implementing iSCSI is it does have its own name convention, something called the IQN—the iSCSI Qualified Name. So I think most companies would have to sit down and just very briefly just think about what they’re going to use as their IQN, what standard are they going to use for that. And there are some conventions around the IQN you can follow, but there are some parts of it that you might want to make unique to your business. And the reason that’s important is all security, whether they can see a LUN or not see a LUN, is done through the IQN in most cases. So if you establish a poor convention, it’s quite difficult, or a pain in the butt, to have to go back and re-establish and standardize on the IQN.
And going back to the disadvantages of iSCSI, whenever I’ve had a problem with iSCSIs in my own lab environment it’s because I’ve [messed up] an entry on the IQN. I’ve typed incorrectly. So it’s a “fat finger syndrome” [that] I think seems to apply more with IP storage—iSCSI storage and NFS—than perhaps it does with Fibre Channel.
Building off of that, how do you configure iSCSI in vSphere—can you explain those steps?
Laverick: In a nutshell the first thing you need to do is make sure that you’ve got a virtual switch with at least two network cards backing that virtual switch with what’s called a VMkernel port group--literally an IP address that will allow the ESX host to speak to the iSCSI system. And in ESX 4, a new configuration was introduced to allow improved multipathing to that storage so that you get a load on both network cards or more if you have them. That load balancing is less important if you’re going down the 10 GB route. Your multiple NICs then are really just there for redundancy as opposed to any load balancing. … And the next stage would be to enable the iSCSI stack in ESX, which, when you click “properties” and enable it, it will create an IQN for you, but I think most customers change that to be something that’s their standard. And it creates a kind of alias of that device with the name of VM HBA 34 or 40 … . So it actually looks like a physical iSCSI HBA, but really what’s backing it is a VMkernel port with Ethernet cards behind it. Once it’s enabled, there’s a little tab where you can type in the IP addresses … of the SCSI target.
The last thing to check is [to ask yourself] does the iSCSI system support CHAP authentication and has it been enabled? CHAP is just an authentication protocol that allows you to say you don’t just have to type in the IP address to connect to the iSCSI system, you also need to know the password to connect to that system. In fairness, if you’re VLAN-ing stuff off into a private network, most customers don’t use CHAP at all, but if you’re doing a rescan and you’re not getting the LUNs back, there are basically three reasons why this could be: wrong IP address, the [communications] aren’t in place, the wrong IQN, and then the third thing to check is has CHAP been enabled. It depends on what your environment is like. If you’re in charge of the VMware and the hardware, you’re more likely to find that everything fits together. But if you have a storage team separate from the VMware team, there may be a lack of communication about what the appropriate standards and what’s needed to make the thing work. So a lot depends on your environment and what’s being implemented. But I think in a nutshell that’s what we would do in VMware to enable iSCSI.
If you’ve been doing it for a while, it’s less than a couple of minutes per server, and in my own environment I have all this scripted, so if I build a new server, the scripts enable the iSCSI stacks [and] the scripts enable the configuration of the network. So going back to [the problem of] “if you wipe a server and rebuild it, then you have to put it all back,” the answer to that is if you go down the route of re-scripted installs, then you just rerun your scripts and everything is rosy in the garden again.
How do you test the performance of an iSCSI storage device? Are there any specific tools you would recommend for this?
Laverick: I guess there are two sides of that in terms of tools. There are ones that are vendor-specific … whether that’s Citrix, VMware or Microsoft, and then there are more generic ones. From the VMware perspective, there are tools like ESXTOP and VM iSCSI stacks—but the VM iSCSI stacks are probably better because they focus on a particular VM and what its I/O is, so if you’re trying to troubleshoot why this particular VM is having trouble accessing a particular type of storage, this one is a good one to look at.
On a more generic level there are tools like I/O meter, which allows you to generate a fake disk load in a virtual machine, and then you can use that fake load to get a baseline of what is [to] be expected. On a macro level … both VirtualCenter and Microsoft SCVMM [System Center Virtual Machine Manager] have charts that will show you what the kind of I/O is like. But I think you’re better off with command line tools because they will give you a second-by-second line speed (number of megabytes per second) that’s being read or written out with the system, which can be helpful in troubleshooting.
I think what I would say about performance is that it’s all about expectations. And you should expect the same level of performance of iSCSI as you would get out of Fibre Channel. So don’t expect an iSCSI system to be slower than Fibre Channel. I think one of the problems the industry has generally is this very mistaken notion that Fibre Channel represents the race horse of storage, iSCSI represents the next step down, and NFS represents the donkey of storage, and all of that is garbage. I’ve got customers who use NFS in their environments, and because it’s being specced appropriately, it can outperforms the equivalent Fibre Channel.
I think in terms of protocols—FC, iSCSI, NFS—they’re only really as good as their hardware—the number of spindles you have, etc. What we’re really getting out of these protocols [are] different features and different advantages, and we need to look at those advantages and disadvantages … and weigh them for you and your business. Which one is the best fit? And you might find yourself using a combo of both. You might use a combo of Fibre Channel with NFS, [or] iSCSI with NFS. It’s rare that I see a combination of Fibre Channel and iSCSI at the same time because they offer very similar features, but people often want a combination of block-based storage (Fibre Channel or iSCSI) and/or something that’s NFS- or CIFS-based, and that’s what leads them down the route of NAS-based technologies. Try not to see it as an either/or [option with these protocols]. Most of the array vendors—I’ve got NetApp, Dell EqualLogic and EMC in my environment—support all three protocols, so really it’s up to you to look at the features and say, “Well, this is the more appropriate one for my particular usage case.”