[CentOS] SUMMARY : multipath using defaults rather than multipath.conf contents for some devices (?) - why ?

Sun Sep 27 20:08:42 UTC 2009
McCulloch, Alan <alan.mcculloch at agresearch.co.nz>

The reason for the behaviour observed below turned out to be that the 
device  entry in /etc/multipath.conf was inadvertently appended *after* the devices 
section , rather  than inside it - so that we  had

#devices {
#    device {
#        blah blah
#    }                 (file has a bunch of defaults commented out)
#    etc
#}
#              
#    
device {
  our settings
}

*rather than*

devices {
       device {
  		our settings
	}
}


Also - looking more closely at our multipath.conf.defaults, there is an entry for 
a product pattern HSV2.*

That would explain why the multipath settings for the HSV200 looked different to 
the HSV400 - the HSV200 was picking up a different set of defaults.
However still not exactly what we had specified.

(sort of like CSS behaviour....."cascading multipath.conf defaults" ! :-) ) 

You also possibly need to pay attention to the basic whitespace formatting of this 
file - for example I had noticed the above early on and rebooted
with it "fixed" , only to find that this meant the system came up 
unable to even recognise the ext3 filesystem, complaining about bad superblock 
etc etc.

After recovering from that ( see below)  I went back ( a week later) and a) made sure that 
the  whitespace - tabs etc  - looked like the  other commented out defaults and b) added 
in a second device entry (from the defaults section - not a device we even have) , just 
in case there was some bug relating to having only one device entry in the "devices" 
section.

Whether because of those or some other change, the system now comes up fine, and with 
multipath -ll now reporting the correct settings for the HSV400. So issue resolved 
finally. Touch wood.

Here is a useful tip (not news to gurus but was to me ) : when things turn to custard on reboot and 
everything including the root filesystem is mounted readonly - so you can't even restore multipath.conf 
to the  backup you made before rebooting - you can 


mount -n -o remount /

to remount / as writeable - then you can 

cd /etc
cp multipath.conf.bu1 multipath.conf

and breathe a big sigh of relief !

If there are any multipath developers reading this - it could be handy if multipath could log some 
diagnostic info about how it parses the conf file , and exactly what entry it ends up using , which 
would then appear in /var/log/messages and /var/log/dmesg, since it seems relatively easy to end up 
matching a device spec you didn't expect, and the effects on performance are subtle so 
easy to overlook this.


Cheers

AMcC


------------------------ original post ------------------------

hi all

We have a rh linux server connected to two HP SAN controllers, one an HSV200 (on the way out), 
the other an HSV400 (on the way in). (Via a Qlogic HBA).

/etc/multipath.conf contains this : 

device
{
        vendor          "(COMPAQ|HP)"
        product         "HSV1[01]1|HSV2[01]0|HSV300|HSV4[05]0"
        getuid_callout  "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout    "/sbin/mpath_prio_alua /dev/%n"
        hardware_handler "0"
        path_selector   "round-robin 0"
        path_grouping_policy    group_by_prio
        failback        immediate
        rr_weight       uniform
        no_path_retry   18
        rr_min_io       100
        path_checker    tur
}

- but our actual multipathing as shown by multipath -ll , and multipath -ll -v 3 looks as though for the 
HSV400 it is using the defaults rather  than these settings. The defaults are 

#defaults {
#       udev_dir                /dev
#       polling_interval        10
#       selector                "round-robin 0"
#       path_grouping_policy    multibus
#       getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
#       prio_callout            /bin/true
#       path_checker            readsector0
#       rr_min_io               100
#       rr_weight               priorities
#       failback                immediate
#       no_path_retry           fail
#       user_friendly_name      yes


and multipath -ll reports :

.
.
[snip other HSV400 paths - all similar]
mpath12 (3600508b40007518f0000900000520000) dm-1 HP,HSV400
[size=150G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 0:0:5:9  sdab 65:176 [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 0:0:3:9  sdn  8:208  [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 0:0:4:9  sdu  65:64  [active][ready]
mpath11 (3600508b40007518f0000700000370000) dm-6 HP,HSV200
[size=200G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=50][active]
 \_ 0:0:1:7  sdd  8:48   [active][ready]
\_ round-robin 0 [prio=10][enabled]
 \_ 0:0:2:7  sdh  8:112  [active][ready]
.
.
[snip other HSV200 paths - all similar]



multipath -ll -v 3 includes explicit statements that defaults are being used for the HSV400

(long output snipped...)

sdaa: path checker = readsector0 (config file default)

versus

sda: path checker = tur (controller setting)

sdx: getprio = NULL (internal default)

versus

sdd: getprio = /sbin/mpath_prio_alua %n (controller setting)



- furthermore we see in the log file messages from both readsector0 *and* tur 
rather than just tur if the correct settings were used , which also backs that up.

My questions are basically - why is it happening , and how to fix it ? 

The vendor and product regexps definitely do match both "HSP" and both "HSV200" and "HSV400" respectively
so it doesn't seem that fiddling with the patterns will work , and I'm sure this config has been tested.

Its not due to this server having to deal with two controllers - we have a second server that only mounts from 
the HSV400, and  its multipath settings appear to be entirely the defaults, and not what we have set.

(And conversely, its not due to the conf file not being read at all - since the server with two controllers
is using the correct config for one of them , but not the other.)

thanks for any tips and I will summarise.

Cheers

AMcC

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================