Hi,
I was wondering if anyone has successfully configured two lsi/3ware 9750-4i series controllers for multipathing under CentOS 5.7 x86_64?
I've tried some basic setups with both multibus and failover settings, and had repeatable filesystem corruption over a iscsi(tgtd) or nfs3 connection.
Any ideas? Vahan
On Friday, January 13, 2012, Vahan Yerkanian vahan@arminco.com wrote:
Hi,
I was wondering if anyone has successfully configured two lsi/3ware
9750-4i series controllers for multipathing under CentOS 5.7 x86_64?
I've tried some basic setups with both multibus and failover settings,
and had repeatable filesystem corruption over a iscsi(tgtd) or nfs3 connection.
Have you tried multipathd?
-Ross
On Jan 13, 2012, at 6:33 PM, Ross Walker wrote:
On Friday, January 13, 2012, Vahan Yerkanian vahan@arminco.com wrote:
Hi,
I was wondering if anyone has successfully configured two lsi/3ware 9750-4i series controllers for multipathing under CentOS 5.7 x86_64?
I've tried some basic setups with both multibus and failover settings, and had repeatable filesystem corruption over a iscsi(tgtd) or nfs3 connection.
Have you tried multipathd?
-Ross
Yes, sorry I should've been more clear. I have configured the multipathing using the multipathd using the bare-bone configuration, as it didn't have LSI/3Ware controller-specific preset in the devices {} block.
What I did was based on the [1] and in the end consisted of this (I thinned it down in the end trying to find the culprit) multipath.conf:
blacklist { devnode "sda" # the boot disk }
defaults { user_friendly_names yes }
multipaths { multipath { alias storage wwid 3600050c000015400f3ae000009040000 path_grouping_policy multibus } }
multipath -ll showed everything OK, with both sdb and sdc (the same 24 x 3tb raid6 array) as active and ready.
However no matter what I did, the filesystem is getting corrupted in 3-4 hours of active usage...
[1] http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/pdf/DM_Multipat...
On 01/13/12 6:41 AM, Vahan Yerkanian wrote:
multipath -ll showed everything OK, with both sdb and sdc (the same 24 x 3tb raid6 array) as active and ready.
are those controllers aware you're using them for multipathing? RAID cards like that tend to have large caches, and one controllers cache won't see changes written to the other, leading to inconsistent data, unless the controllers have some form of back channel communications between them to coordinate their caches.
btw, thats _way_ too many disks in a single disk group, your disk rebuild times with 24 x raid6 will be ouch long. I try and limit my raid groups to 12 drives max, and stripe those. given 24 disks, I'd probably have 2 hot spares, and 2 x 11 raid60, which would provide the space equivalent of 18 disks
On Jan 13, 2012, at 2:37 PM, John R Pierce pierce@hogranch.com wrote:
On 01/13/12 6:41 AM, Vahan Yerkanian wrote:
multipath -ll showed everything OK, with both sdb and sdc (the same 24 x 3tb raid6 array) as active and ready.
are those controllers aware you're using them for multipathing? RAID cards like that tend to have large caches, and one controllers cache won't see changes written to the other, leading to inconsistent data, unless the controllers have some form of back channel communications between them to coordinate their caches.
John's right, I thought these were straight SAS/SATA controllers.
You will need to publish these disks as straight through individual disks with write-through cache and use software RAID if the controllers can't communicate with each other.
Some controllers are smart enough to perform multipathing across them but they tend to cost more than $500.
The Dell PERC (LSI) RAID controllers I have at work do multipathing on-board between multiple connections to each enclosure, but not between multiple controllers. To do that I would need two plain SAS/SATA controllers and handle RAID in software.
I have done that successfully with Solaris and ZFS in the past, but Linux software RAID wasn't performant enough for large RAID6s (in my experience).
btw, thats _way_ too many disks in a single disk group, your disk rebuild times with 24 x raid6 will be ouch long. I try and limit my raid groups to 12 drives max, and stripe those. given 24 disks, I'd probably have 2 hot spares, and 2 x 11 raid60, which would provide the space equivalent of 18 disks
I agree with John here too.
Create two RAID6 groups and use software to stripe them, either using mdraid or lvm.
If it were me, I'd put each RAID6 on a separate controller for balanced parity calculations and then stripe the two volumes in LVM. Keep a third controller as a spare in the closet.
-Ross
On 01/13/12 3:46 PM, Ross Walker wrote:
You will need to publish these disks as straight through individual disks with write-through cache and use software RAID if the controllers can't communicate with each other.
write-through cache is not even good enough. if a given block is written through one of them, it will land on the disk, but it won't update the other controller cache, so the other controller could have stale data in its cache and if another read takes that path, it will get the old data, and things go downhill from there quickly.
On Jan 13, 2012, at 6:51 PM, John R Pierce pierce@hogranch.com wrote:
On 01/13/12 3:46 PM, Ross Walker wrote:
You will need to publish these disks as straight through individual disks with write-through cache and use software RAID if the controllers can't communicate with each other.
write-through cache is not even good enough. if a given block is written through one of them, it will land on the disk, but it won't update the other controller cache, so the other controller could have stale data in its cache and if another read takes that path, it will get the old data, and things go downhill from there quickly.
And read-through too, bascially disable caching on the controllers.
-Ross
Thanks for the comments folks, your points are damn right. I believe what I experienced was the data corruption via stale caches..
So much for not opening up the case (actually for the first time, I resisted due to the lack of time) and not checking if these controllers are somehow linked together for write cache etc exchange… AFAIK they're just connected via 8087s to the backplanes with dual-port SAS drives.
Going back to the LSI website to check if these 9750 have an option to link their caches into one...
On 01/14/12 2:32 PM, Vahan Yerkanian wrote:
Thanks for the comments folks, your points are damn right. I believe what I experienced was the data corruption via stale caches..
So much for not opening up the case (actually for the first time, I resisted due to the lack of time) and not checking if these controllers are somehow linked together for write cache etc exchange… AFAIK they're just connected via 8087s to the backplanes with dual-port SAS drives.
Going back to the LSI website to check if these 9750 have an option to link their caches into one...
who built/configured this system?
On Jan 15, 2012, at 2:52 AM, John R Pierce wrote:
who built/configured this system?
Someone you don't know. ;) A local distributor. The system was shipped with MS OS with MPIO ISCSI targets installed claiming to be tested. Of course I had to remove that offending OS and install CentOS :)
You know the rest of the story.
On 01/14/12 2:32 PM, Vahan Yerkanian wrote:
Going back to the LSI website to check if these 9750 have an option to link their caches into one...
I was curious, *all* the LSI sAS RAID cards say 'single controller multipathing', both the megaraid cards and the 3ware cards (I'm setting up a server that has a 9260-8i now). I looked at their optional software, and none of it implements any sort of multi-controller failover.
If you have split or multiple storage backplanes/enclosures, I"d split the disks between the two cards, and multipath between each controller and its respective disks. this will take another set of SAS cables, of course
On Jan 15, 2012, at 3:52 AM, John R Pierce wrote:
On 01/14/12 2:32 PM, Vahan Yerkanian wrote:
Going back to the LSI website to check if these 9750 have an option to link their caches into one...
I was curious, *all* the LSI sAS RAID cards say 'single controller multipathing', both the megaraid cards and the 3ware cards (I'm setting up a server that has a 9260-8i now). I looked at their optional software, and none of it implements any sort of multi-controller failover.
If you have split or multiple storage backplanes/enclosures, I"d split the disks between the two cards, and multipath between each controller and its respective disks. this will take another set of SAS cables, of course
At the moment I have two 9750-4i installed, each having a sff8087 x4 cable going to the same backplane containing dual-port sas disks.
This was supposed to be a load balanced, multi-controller failover setup.
On 01/14/12 3:55 PM, Vahan Yerkanian wrote:
At the moment I have two 9750-4i installed, each having a sff8087 x4 cable going to the same backplane containing dual-port sas disks.
This was supposed to be a load balanced, multi-controller failover setup.
AFAIK the only way to achieve that is to use plain SAS HBA's such as the LSI 2008 family (92xx cards), not raid controllers.
this is a representive SAS midplane manual http://www.supermicro.com/manuals/other/BPN-SAS-936EL.pdf
it shows failover combinations starting on page 3-3...