[CentOS] dm-multipath use
Craig.Miskell at agresearch.co.nz
Wed Jun 25 20:23:59 UTC 2008
> Are folks in the Centos community succesfully using
> I am looking to deploy it for error handling on our iSCSI
> setup but there
> seems to be little traffic about this package on the Centos
> forums, as far
> as I can tell, and there seems to be a number of small issues
> based on my
> reading the dm-multipath developer lists and related resources.
I'm using it on RHEL 5 (close enough for the purposes of your query), connecting to an HP EVA 6000 SAN. The RHEL documentation (http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/DM_Multipath/index.html) certainly covers the basics adequately, and was enough to get me going. I'm using LVM over the top of that, so I found it worthwhile to tweak /etc/lvm/lvm.conf to filter out all the various aliases for the disks that show up in /dev. My filter line is currently:
filter = [ "r/sd.*/", "r:disk/by.*:", "a/.*/" ]
which works well for me, but YMMV, particularly with the filtering out of "sd.*" (That works here because our main OS disks are on /dev/cciss)
You've also got to be a little careful when unpresenting disks (SAN terminology, may not apply to ISCSI). From our internal documentation (some notes I wrote at the time, and with subsequent experience):
Removing is trickier; you need to ensure no-one is trying to still use the disk. Particularly watch out for lvm. If the disk is part of a volume group, you have to run
#vgchange -an <VGNAME>
first, otherwise LVM still thinks the disk is there, and things like lvmdiskscan/pvdisplay etc start hanging when the disk has gone away.
Once the disk is unused, unpresent the disk from the SAN, rescan to remove no-longer existing disks, then restart multipathd (/etc/init.d/multipathd restart). Running
may also be sufficient, but I've found restarting multipathd entirely a smidgen more reliable (but I may have been doing things wrong before that).
If things get really stuck, then you might have some luck with dmsetup. If "multipath -ll" shows failed disks (that have been unpresented properly), use dmsetup to remove the failed disk with the command:
#dmsetup remove <device>
where <device> is "mpath<num>". Find the stuck one from the output of multipath -ll; be sure you've got the right mpath device.
Optionally, if you've got stuck lvmdiskscan or pvdisplay type processes (trying to access the missing disk), then the "remove" will fail, claiming the device is in use (which, in some senses, it is). In this case, double check you've got the right mpath device (otherwise you'll fsck your system), and run:
#dmsetup remove --force <device>
This will claim failure (device-mapper: remove ioctl failed: Device or resource busy), but if you now run
#dmsetup info <device>
then you'll see the "Open count" has gone to zero. You can now run the plain remove one more time:
#dmsetup remove <device>
and it will be removed. Your hung processes will finally die the death they deserve, and the unpresented disk will be unknown to the system any longer.
It has worked well in real life, except for one day when one of our EVA SAN Controllers died; one host survived, another had multipathd itself die with a double free error (which I bugzilla'd upstream). Disks went away, but came back on restarting multipathd. Odd, but survivable, and not indicative of a general problem (probably something I did early on in the setup that hung around).
And one other word of advice: Play with it a lot in a test system first. It should go without saying, but this is really one of those times. There are many things you can learn safely on a production device; this isn't one of them. Get really comfortable with adding/removing/munging before you go live. And you will break it at least once during your preparation, if not more ;-).
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
More information about the CentOS