Hi all,
I am trying to configure a failover multipath between 2 GNBD devices.
I have a 4 nodes Redhat Cluster Suite (RCS) cluster. 3 of them are used for running services, 1 of them for central storage. In the future I am going to introduce another machine for central storage. The 2 storage machine are going to share/export the same disk. The idea is not to have a single point of failure on the machine exporting the storage.
For concept testing I am using one machine on which I have configured 2 GNBD exports, which are exporting exactly the same disk. These are configured with:
# /sbin/gnbd_export -d /dev/sdb1 -e gnbd0 -u gnbd
# /sbin/gnbd_export -d /dev/sdb1 -e gnbd1 -u gnbd
They are exporting with the same id, so the multipath driver will automatically configure them as alternative paths to the same storage.
Now on one of the cluster nodes used for running services I am importing these GNBD devices with:
# /sbin/gnbd_import -i gnbd1
where gnbd1 is the hostname of the machine exporting the GNBD devices.
And I have these imported ok:
# gnbd_import -l
Device name : gnbd1
----------------------
Minor # : 0
sysfs name : /block/gnbd0
Server : gnbd11
Port : 14567
State : Open Connected Clear
Readonly : No
Sectors : 41941688
Device name : gnbd0
----------------------
Minor # : 1
sysfs name : /block/gnbd1
Server : gnbd1
Port : 14567
State : Open Connected Clear
Readonly : No
Sectors : 41941688
#
After, I have configured the device-mapper multipath by commenting the "blacklist" section in /etc/multipath.conf and adding this "defaults" section:
defaults {
user_friendly_names yes
polling_interval 5
#path_grouping_policy failover
path_grouping_policy multibus
rr_min_io 1
failback immediate
#failback manual
no_path_retry fail
#no_path_retry queue
}
Now I have the mpath device configured correctly (IMHO):
# multipath -ll
mpath0 (gnbd) dm-2 GNBD,GNBD
[size=20G][features=0][hwhandler=0]
_ round-robin 0 [prio=2][enabled]
_ #:#:#:# gnbd0 252:0 [active][ready]
_ #:#:#:# gnbd1 252:1 [active][ready]
#
# dmsetup ls
mpath0 (253, 2)
VolGroup00-LogVol01 (253, 1)
VolGroup00-LogVol00 (253, 0)
#
Now I mkfs.ext3 over the mpath0 device to create a filesystem, then mount. After I start to copy a file (with scp - to have a progress bar) and during the copy process I shutdown one of the exported GNBD device on the disk exporting machine with:
# gnbd_export -r gnbd1 -O
After a while in the maillog:
gnbd_recvd[3357]: client lost connection with gnbd11 : Broken pipe
gnbd_recvd[3357]: reconnecting
kernel: gnbd1: Receive control failed (result -32)
kernel: gnbd1: shutting down socket
kernel: exiting GNBD_DO_IT ioctl
kernel: gnbd1: Attempted send on closed socket
gnbd_recvd[3357]: ERROR [gnbd_recvd.c:292] login refused by the server : No such
device
gnbd_recvd[3357]: reconnecting
kernel: device-mapper: multipath: Failing path 252:1.
multipathd: gnbd1: directio checker reports path is down
multipathd: checker failed path 252:1 in map mpath0
multipathd: mpath0: remaining active paths: 1
gnbd_recvd[3357]: ERROR [gnbd_recvd.c:292] login refused by the server : No such
device
gnbd_recvd[3357]: reconnecting
Now the copy process is freezed. It stays that way until the GNBD device is exported again. I try some commands on the multipath machine:
# multipath -ll
gnbd1: checker msg is "directio checker reports path is down"
mpath0 (gnbd) dm-2 GNBD,GNBD
[size=20G][features=0][hwhandler=0]
_ round-robin 0 [prio=1][active]
_ #:#:#:# gnbd0 252:0 [active][ready]
_ #:#:#:# gnbd1 252:1 [failed][faulty]
<freezed, the prompt is not returning back>
This prompt get back after the GNBD device is exported again.
My expectations were that in such a scenario the multipath driver is going to switch the requests to the other path and everything should continue to work. Am I wrong?
I have upgraded to the last version of all the RPMs. I am using CentOS 5.1.
I have tried different multipath settings (which are commented out in the multipath.conf "defaults" section I pasted previously), but nothing happens.
This may be useful. When starting the machine in the log:
multipathd: gnbd0: add path (uevent)
kernel: device-mapper: multipath round-robin: version 1.0.0 loaded
multipathd: mpath0: load table [0 41941688 multipath 0 0 1 1 round-robin 0 1 1
252:0 1000]
multipathd: mpath0: event checker started
multipathd: dm-2: add map (uevent)
multipathd: dm-2: devmap already registered
gnbd_recvd[3357]: gnbd_recvd started
kernel: resending requests
multipathd: gnbd1: add path (uevent)
multipathd: mpath0: load table [0 41941688 multipath 0 0 1 1 round-robin 0 2 1
252:0 1000 252:1 1000]
multipathd: dm-2: add map (uevent)
multipathd: dm-2: devmap already registered
Maybe this is a bug of GNBD not the multipath? Any help for getting this working will be very appreciated.
Thanks.