[CentOS] Multipath and iSCSI Targets

Fri Mar 19 22:18:40 UTC 2010
Ross Walker <rswwalker at gmail.com>

On Mar 19, 2010, at 11:12 AM, "nate" <centos at linuxpowered.net> wrote:

> Joseph L. Casale wrote:
>> Just started messing with multipath against an iSCSI target with 4  
>> nics.
>> What should one expect as behavior when paths start failing? My lab  
>> setup
>> was copying some data on a mounted block device when I dropped 3 of  
>> 4 paths
>> and the responsiveness of the server completely tanked for several  
>> minutes.
>>
>> Is that still expected?
>
> Depends on the target and the setup, ideally if you have 4 NICs you
> should be using at least two different VLANs, and since you have 4  
> NICs
> (I assume for iSCSI only) you should use jumbo frames.

Jumbo frames should only be used if your CPU can't keep up with the  
load of 4 NICs otherwise it does add some latency to iSCSI.


> With my current 3PAR storage arrays and my iSCSI targets each system
> has 4 targets but usually 1 NIC, my last company(same kind of storage)
> I had 4 targets and 2 dedicated NICs(each on it's own VLAN for routing
> purposes and jumbo frames).
>
> In all cases MPIO was configured for round robin, and failed over
> in a matter of seconds.
>
> Failing BACK can take some time depending on how long the path was
> down for, at least on CentOS 4.x (not sure on 5.x) there was some
> hard coded timeouts in the iSCSI system that could delay path
> restoration for a  minute or more because there was a somewhat
> exponential back off timer for retries, this caused me a big
> headache at one point doing a software upgrade on our storage array
> which will automatically roll itself back if all of the hosts do
> not re-login to the array within ~60 seconds of the controller coming
> back online.
>
> If your iSCSI storage system is using active/passive controllers
> that may increase fail over and fail back times and complicate
> stuff, my arrays are all active-active.

I would check the dm-multipath comfig for how it handles errors, it  
might retry multiple times before marking a path bad.

That will slow things to a crawl.

-Ross