We've been wrestling with this for ... rather longer than I'd care to admit.
Host / initiator systems are a number of real and virtualized CentOS 5.5 boxes. Storage arrays / targets are Dell MD3220i storage arrays.
CentOS is not a Dell-supported configuration, and we've had little helpful advice from Dell. There's been some amount of FUD in that Dell don't seem to know what Dell's own software installation (the md3
Dell doesn't seem to have much OS experience generally.
Their docs are pretty inconsistent. I've noted omissions, terminology differences, and procedural differences among the "Owner's Manual", "Deployment Guide", a professional services "Remote Services Installation Agreement" service description. Some of the multipathing guidance we've had comes from their EqualLogic line of storage servers.
Questions:
1: Is there anyone out there running this configuration and are you satisfied with it?
2: We get a set of error messages on the initator at target login. These appear to be benign, and web research suggests it's the result of a driver configuration issue in trying to send instructions to a
http://comments.gmane.org/gmane.linux.iscsi.open-iscsi/5970 http://bit.ly/gguLl7
MD3000i boxes have 1 controller that can execute IO and one that cannot execute READ/WRITE IO until a special command is sent. In older kernels, layers like the partition scanning and udev and hal would send down IO to those disabled paths and we would see IO errors like in your link.
In newer kernels we have device handler modules (for MD3000i you would want scsi_dh_rdac (so do "modprobe scsi_dh_rdac" and then lsmod to see if it is there)) that will detect if the path is not active and if so it will either not send the IO or it will not print a error message since we expect the IO to fail.
We'd prefer *not* seeing spurious I/O errors that we don't have to sift through looking for real storage issues.
3: The MD32xxi series is a dual-controller array with multiple ports on each controller. Multiple targets can be logged into from an initiator, with the pathways aggregated by the Linux multipath (device-mapper-multipath) system. There's very little clear documentation on multipath. In particular, any way to trigger alerting from multipath events / failures, or iscsi session actions, would be helpful. The MD32xxi series only supports reporting from a GUI management utility (MDSM) which would be at best problematic to run in a server environment. The other question is: is multipathing typical of iSCSI configuation? Little of the iSCSI docs I've found discusses multipath configurations at all:
http://www.open-iscsi.org/ http://www.cuddletech.com/articles/iscsi/index.html (good but very dated) http://www.cyberciti.biz/tips/rhel-centos-fedora-linux-iscsi-howto.html (no mention of multipathing)
4: Dell suggests a shutdown procedure including a flush of multipathing paths ("multipath -F" -- in the "Owner's Manual"):
3 Flush the Device Mapper multipath maps list to remove any old or modified mappings # multipath F
Presumably this would go into one of the init scripts -- perhaps the /etc/init.d/multipath script, as part of the "stop" sequence (after the multipath daemon is killed). Anyone done this or know why the practice is recommended?
Based on our experience with Dell I would NOT recommend this configuration for others. But we're stuck with it, and any help in getting things configured would be very helpful and gratefully received.
I'm also hoping to get clearance to release docs we've generated, though that's the subject of some internal negotiation.
On Jan 21, 2011, at 6:41 PM, Edward Morbius dredmorbius@gmail.com wrote:
We've been wrestling with this for ... rather longer than I'd care to admit.
Host / initiator systems are a number of real and virtualized CentOS 5.5 boxes. Storage arrays / targets are Dell MD3220i storage arrays.
CentOS is not a Dell-supported configuration, and we've had little helpful advice from Dell. There's been some amount of FUD in that Dell don't seem to know what Dell's own software installation (the md3
Dell doesn't seem to have much OS experience generally.
Their docs are pretty inconsistent. I've noted omissions, terminology differences, and procedural differences among the "Owner's Manual", "Deployment Guide", a professional services "Remote Services Installation Agreement" service description. Some of the multipathing guidance we've had comes from their EqualLogic line of storage servers.
Questions:
1: Is there anyone out there running this configuration and are you satisfied with it?
2: We get a set of error messages on the initator at target login. These appear to be benign, and web research suggests it's the result of a driver configuration issue in trying to send instructions to a
http://comments.gmane.org/gmane.linux.iscsi.open-iscsi/5970 http://bit.ly/gguLl7 MD3000i boxes have 1 controller that can execute IO and one that cannot execute READ/WRITE IO until a special command is sent. In older kernels, layers like the partition scanning and udev and hal would send down IO to those disabled paths and we would see IO errors like in your link. In newer kernels we have device handler modules (for MD3000i you would want scsi_dh_rdac (so do "modprobe scsi_dh_rdac" and then lsmod to see if it is there)) that will detect if the path is not active and if so it will either not send the IO or it will not print a error message since we expect the IO to fail.
We'd prefer *not* seeing spurious I/O errors that we don't have to sift through looking for real storage issues.
3: The MD32xxi series is a dual-controller array with multiple ports on each controller. Multiple targets can be logged into from an initiator, with the pathways aggregated by the Linux multipath (device-mapper-multipath) system. There's very little clear documentation on multipath. In particular, any way to trigger alerting from multipath events / failures, or iscsi session actions, would be helpful. The MD32xxi series only supports reporting from a GUI management utility (MDSM) which would be at best problematic to run in a server environment. The other question is: is multipathing typical of iSCSI configuation? Little of the iSCSI docs I've found discusses multipath configurations at all:
http://www.open-iscsi.org/ http://www.cuddletech.com/articles/iscsi/index.html (good but very dated) http://www.cyberciti.biz/tips/rhel-centos-fedora-linux-iscsi-howto.html (no mention of multipathing)
4: Dell suggests a shutdown procedure including a flush of multipathing paths ("multipath -F" -- in the "Owner's Manual"):
3 Flush the Device Mapper multipath maps list to remove any old or modified mappings # multipath F
Presumably this would go into one of the init scripts -- perhaps the /etc/init.d/multipath script, as part of the "stop" sequence (after the multipath daemon is killed). Anyone done this or know why the practice is recommended?
Based on our experience with Dell I would NOT recommend this configuration for others. But we're stuck with it, and any help in getting things configured would be very helpful and gratefully received.
I'm also hoping to get clearance to release docs we've generated, though that's the subject of some internal negotiation.
You need the RDAC kernel module installed, this handles asymmetric multipathing to these devices.
You can get this from Dell's site.
Once this is installed you need to setup dm-multipath, look for multipathd.conf in /etc, get the product id and vendor id from dmesg after making an initial connection via open-iscsi and use that in the mutipath config. Your going to need to use path utility 'rdac' in the config instead of tur.
Google is your friend here.
-Ross
On Fri, Jan 21, 2011 at 3:58 PM, Ross Walker rswwalker@gmail.com wrote:
On Jan 21, 2011, at 6:41 PM, Edward Morbius dredmorbius@gmail.com wrote:
We've been wrestling with this for ... rather longer than I'd care to admit.
Host / initiator systems are a number of real and virtualized CentOS 5.5 boxes. Storage arrays / targets are Dell MD3220i storage arrays.
...
You need the RDAC kernel module installed, this handles asymmetric multipathing to these devices.
You can get this from Dell's site.
Once this is installed you need to setup dm-multipath, look for multipathd.conf in /etc, get the product id and vendor id from dmesg after making an initial connection via open-iscsi and use that in the mutipath config. Your going to need to use path utility 'rdac' in the config instead of tur.
Google is your friend here.
We've got *an* rdac module installed. Any way of telling whether or not this is Dell's? RPM says these are from kernel-2.6.18-194.17.1.el5.src.rpm.
$ lsmod | grep rdac scsi_dh_rdac 43977 0 scsi_dh 42177 2 scsi_dh_rdac,dm_multipath scsi_mod 196953 14 scsi_dh_rdac,be2iscsi,ib_iser,iscsi_tcp,bnx2i,cxgb3i,libiscsi2,scsi_transport_iscsi2,scsi_dh,sr_mod,sg,libata,megaraid_sas,sd_mod
$ rpm -qif $(locate rdac.ko) Name : kernel Relocations: (not relocatable) Version : 2.6.18 Vendor: CentOS Release : 194.17.1.el5 Build Date: Wed 29 Sep 2010 11:57:11 AM PDT Install Date: Thu 14 Oct 2010 02:17:14 PM PDT Build Host: builder10.centos.org Group : System Environment/Kernel Source RPM: kernel-2.6.18-194.17.1.el5.src.rpm Size : 96488290 License: GPLv2 Signature : DSA/SHA1, Thu 30 Sep 2010 08:35:49 AM PDT, Key ID a8a447dce8562897 URL : http://www.kernel.org/ Summary : The Linux kernel (the core of the Linux operating system) Description : The kernel package contains the Linux kernel (vmlinuz), the core of any Linux operating system. The kernel handles the basic functions of the operating system: memory allocation, process allocation, device input and output, etc. Name : kernel Relocations: (not relocatable) Version : 2.6.18 Vendor: CentOS Release : 194.17.1.el5 Build Date: Wed 29 Sep 2010 11:57:11 AM PDT Install Date: Thu 14 Oct 2010 02:17:14 PM PDT Build Host: builder10.centos.org Group : System Environment/Kernel Source RPM: kernel-2.6.18-194.17.1.el5.src.rpm Size : 96488290 License: GPLv2 Signature : DSA/SHA1, Thu 30 Sep 2010 08:35:49 AM PDT, Key ID a8a447dce8562897 URL : http://www.kernel.org/ Summary : The Linux kernel (the core of the Linux operating system) Description : The kernel package contains the Linux kernel (vmlinuz), the core of any Linux operating system. The kernel handles the basic functions of the operating system: memory allocation, process allocation, device input and output, etc.
There's also a rebuild of 'sg', with a source tree in /usr/src/sg-3.5.34dell
Diffing sources:
$ diff sg.c sg.c_rhel5 22c22 < #define SG_VERSION_STR "3.5.34dell" ---
#define SG_VERSION_STR "3.5.34"
1879c1879 < sg->length = (ret_sz > num) ? num : ret_sz; ---
sg->length = ret_sz;
I'll also note that Dell isn't playing nice with its package installs -- some stuff is under /opt/dell, some is installed via RPM, some appears to be tossed arbitrarily onto the system:
$ rpm -qif /lib/modules/2.6.18-194.17.1.el5/extra/sg.ko file /lib/modules/2.6.18-194.17.1.el5/extra/sg.ko is not owned by any package
Bad Dell. No donut.
-Ross
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Fri, Jan 21, 2011 at 3:58 PM, Ross Walker rswwalker@gmail.com wrote:
On Jan 21, 2011, at 6:41 PM, Edward Morbius dredmorbius@gmail.com wrote:
We've been wrestling with this for ... rather longer than I'd care to admit.
Host / initiator systems are a number of real and virtualized CentOS 5.5 boxes. Storage arrays / targets are Dell MD3220i storage arrays.
...
Once this is installed you need to setup dm-multipath, look for multipathd.conf in /etc, get the product id and vendor id from dmesg after making an initial connection via open-iscsi and use that in the mutipath config. Your going to need to use path utility 'rdac' in the config instead of tur.
Google is your friend here.
-Ross
CentOS mailing list CentOS@centos.or CentOS@centos.org
/etc/multipath.conf appears to be appropriately configured (we'd installed the MDSM host components):
------------------------------------------------------------------------ device { vendor "DELL" product "MD32xxi" path_grouping_policy group_by_prio prio rdac polling_interval 5 path_checker rdac path_selector "round-robin 0" hardware_handler "1 rdac" failback immediate features "2 pg_init_retries 50" no_path_retry 30 rr_min_io 100 prio_callout "/sbin/mpath_prio_rdac /dev/%n" } device { vendor "DELL" product "MD32xx" path_grouping_policy group_by_prio prio rdac polling_interval 5 path_checker rdac path_selector "round-robin 0" hardware_handler "1 rdac" failback immediate features "2 pg_init_retries 50" no_path_retry 30 rr_min_io 100 prio_callout "/sbin/mpath_prio_rdac /dev/%n"
} } ------------------------------------------------------------------------
On Jan 21, 2011, at 7:20 PM, Edward Morbius dredmorbius@gmail.com wrote:
On Fri, Jan 21, 2011 at 3:58 PM, Ross Walker rswwalker@gmail.com wrote: On Jan 21, 2011, at 6:41 PM, Edward Morbius dredmorbius@gmail.com wrote:
We've been wrestling with this for ... rather longer than I'd care to admit.
Host / initiator systems are a number of real and virtualized CentOS 5.5 boxes. Storage arrays / targets are Dell MD3220i storage arrays.
...
Once this is installed you need to setup dm-multipath, look for multipathd.conf in /etc, get the product id and vendor id from dmesg after making an initial connection via open-iscsi and use that in the mutipath config. Your going to need to use path utility 'rdac' in the config instead of tur.
Google is your friend here.
-Ross
CentOS mailing list CentOS@centos.or
/etc/multipath.conf appears to be appropriately configured (we'd installed the MDSM host components):
device { vendor "DELL" product "MD32xxi" path_grouping_policy group_by_prio prio rdac polling_interval 5 path_checker rdac path_selector "round-robin 0" hardware_handler "1 rdac" failback immediate features "2 pg_init_retries 50" no_path_retry 30 rr_min_io 100 prio_callout "/sbin/mpath_prio_rdac /dev/%n" } device { vendor "DELL" product "MD32xx" path_grouping_policy group_by_prio prio rdac polling_interval 5 path_checker rdac path_selector "round-robin 0" hardware_handler "1 rdac" failback immediate features "2 pg_init_retries 50" no_path_retry 30 rr_min_io 100 prio_callout "/sbin/mpath_prio_rdac /dev/%n" }
}
AFAIK the RDAC you have installed looks correct and the config also looks good.
Did you start the multipath service make a connection to each IP and do a 'multipath -ll' and see what shows up?
-Ross
on 20:07 Fri 21 Jan, Ross Walker (rswwalker@gmail.com) wrote:
On Jan 21, 2011, at 7:20 PM, Edward Morbius dredmorbius@gmail.com wrote:
On Fri, Jan 21, 2011 at 3:58 PM, Ross Walker rswwalker@gmail.com wrote: On Jan 21, 2011, at 6:41 PM, Edward Morbius dredmorbius@gmail.com wrote:
We've been wrestling with this for ... rather longer than I'd care to admit.
Host / initiator systems are a number of real and virtualized CentOS 5.5 boxes. Storage arrays / targets are Dell MD3220i storage arrays.
...
Once this is installed you need to setup dm-multipath, look for multipathd.conf in /etc, get the product id and vendor id from dmesg after making an initial connection via open-iscsi and use that in the mutipath config. Your going to need to use path utility 'rdac' in the config instead of tur.
Google is your friend here.
-Ross
CentOS mailing list CentOS@centos.or
/etc/multipath.conf appears to be appropriately configured (we'd installed the MDSM host components):
device { vendor "DELL" product "MD32xxi"
<...>
AFAIK the RDAC you have installed looks correct and the config also looks good.
Thanks.
Did you start the multipath service make a connection to each IP and do a 'multipath -ll' and see what shows up?
Yes, and yes.
We've actually run some fairly intensive disk tests (bonnie++ and a few tens of thousands of 100MB file copies of random data) with no errors across various hosts.
The on-connect errors are the biggest issue we've got, though general concensus seems to be that we can ignore these.
What's moderately maddening is the lack of any clear documentation or guidance, from Dell, RH, or the upstream open-iscsi / multipath projects, on what we should be experiencing, and what, if any, errors are considered "normal".
Think we've got a handle on it, but we're checking our sanity as well.