[CentOS] CentOS and Dell MD3200i / MD3220i iSCSI w/ multipath -- slightly OT

Mon Jan 31 13:49:11 UTC 2011
Peter Gillich <pgillich at gmail.com>

I forget to describe a non-general solution for I/O errors at boot
time. Its the Multi-Path Proxy driver (linuxrdac), which acts as one
device. Here is a description:
http://linux.dell.com/wiki/index.php/Products/HA/DellRedHatHALinuxCluster/Storage/PowerVault_MD3000/Software

BR,

Peter

On Fri, Jan 28, 2011 at 22:32, Peter Gillich <pgillich at gmail.com> wrote:
> Hi Ed,
>
> The persistent reservation is a SCSI-3 feature. I'ts useful in a
> cluster environment, where multiple nodes are configured to access a
> device while at the same time blocking access to other nodes.
>
> To disable the iSCSI offload feature, disable the Broadcom iSCSI diver
> (bnx2i), for example:
> echo "install bnx2i /bin/true" > /etc/modprobe.d/blacklist-broadcom
> In this case, only the bnx2 module will be loaded. You can check it by
> the lsmod, modinfo and dmesg. Of course, the processor stress will be
> inceased.
>
> BR,
>
> Peter
>
> On Tue, Jan 25, 2011 at 05:37, Dr. Ed Morbius <dredmorbius at gmail.com> wrote:
>> on 07:48 Sun 23 Jan, Peter Gillich (pgillich at gmail.com) wrote:
>>> Hi,
>>> In last summer, I have had same problems with Dell + CentOS +
>>> multipath combination. For example I/O errors and stability problems
>>> on the initiator machines. The initator machines are (in a Pacemaker
>>> cluster):
>>> - Dell R310
>>> - Broadcom 5709 Gigabit Ethernet card (4-port)
>>> - CentOS 5.4
>>> - 2 Ethernet ports on initiator machines, 2 Ethernet ports in target
>>> machines --> 4 iSCSI pathes by initiators
>>>
>>> Irrespectively of iSCSI, we met the Broadcom MSI-X interrupt problem
>>> (corrected in RHEL/CentOS 5.5). We met more (iSCSI) problems with
>>> Broadcom cards, which are described on a Dell support page:
>>> http://support.dell.com/support/edocs/software/rhel_mn/rhel5_4/en/index.htm
>>
>> Not familiar with this, though we're using Broadcom NICs, four per host
>> for the most part:
>>
>>    01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>>            Subsystem: Dell PowerEdge R610 BCM5709 Gigabit Ethernet
>>            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
>>            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>>            Latency: 0, Cache Line Size: 64 bytes
>>            Interrupt: pin A routed to IRQ 98
>>            Region 0: Memory at d6000000 (64-bit, non-prefetchable) [size=32M]
>>            Capabilities: [48] Power Management version 3
>>                    Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>                    Status: D0 PME-Enable- DSel=0 DScale=1 PME-
>>            Capabilities: [50] Vital Product Data
>>
>> We're bonding two NICs together on each of our core and management nets,
>> iSCSI traffic is on the management net.
>>
>> (VMs are set to use E1000, single interface per subnet).
>>
>>> Since the CentOS is a recompiled RedHat, all RHEL problems and
>>> solutions are true for CentOS ;-)
>>> The Broadcom driver source code is frequently changed. RedHat follows
>>> the Broadcom kernel drivers and iscsi-initiator-utils with some months
>>> latency. CentOS follows the RedHat with some days/weeks/monts.
>>>
>>> Maybe you can find a solution for your problem on a newer Dell support
>>> page: http://support.dell.com/support/edocs/software/rhel_mn/rhel5_5/en/index.htm
>>> Or here:
>>> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/DM_Multipath
>>> http://opensource.marshall.edu/papers/rhel5-iscsi-HOWTO.pdf
>>
>> That Marshall.edu doc looks pretty good.  I'll note that if you're
>> expecting to mount your network devices at boot, having the netdev
>> service running will help (we ran into this issue, repeatedly, thanks to
>> a puppet config ;-).
>>
>>> Some tips:
>>> - I've read somewhere about iSCSI multipath I/O errors, which can be a
>>> normal behaving in a multipath environment at boot time. (?)
>>
>> That has been our experience to date.
>>
>>> - Persistent reservation might be usefult against iSCSI multipath I/O errors.
>>
>> What's persistent reservation?
>>
>>> - Disabling iSCSI offload feature (for example: iSCSI over Broadcom )
>>> and TCP offload feature (for example: NFS over Intel) may be helps.
>>
>> How does one do this / check for this?
>>
>>> - The iSCSI kernel drivers and iscs-initiator-utils must be updated together.
>>
>> We'll keep this in mind.
>>
>>>
>>> Finally, some comments:
>>> - Never use Broadcom GbE card. Intel might be better (mostly)
>>
>> I think we're stuck with 'em.  Dell seems to have been shipping with
>> Broadcom for some years.  Early experiences were horrible, lately it's
>> been getting better, but I'm still leary of the brand.
>>
>>> - The Dell is hardware manufacturer (supplier), not an
>>> OS/driver/utility developer. If you would like to get more support,
>>> you may buy RHEL licenses (with the Dell hardware or from RedHat).
>>> Sometime it's cheaper than taking days for a problem (but sometime
>>> not).
>>
>> Yeah.  We've got a single RH license at this point at it does let us
>> into RH's knowledgebase, though there hasn't been a whole lot there
>> either.
>>
>>> - IBM compiles the latest Broadcom driver if required, see:
>>> http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5073130
>>> - Some Dell hardwares have only x86_64 RedHat certifications See:
>>> https://hardware.redhat.com/show.cgi?id=632145 (R310 + RHEL 6)
>>>
>>> BR,
>>>
>>> Peter
>>
>> Thanks, Peter, very helpful.
>>
>>> On Sat, Jan 22, 2011 at 18:36, Rajagopal Swaminathan
>>> <raju.rajsand at gmail.com> wrote:
>>> >
>>> > Greetings,
>>> >
>>> > On 1/22/11, Edward Morbius <dredmorbius at gmail.com> wrote:
>>> > > CentOS is not a Dell-supported configuration, and we've had little helpful
>>> > > advice from Dell.  There's been some amount of FUD in that Dell don't seem
>>> > > to know what Dell's own software installation (the md3
>>> > >
>>> > > Dell doesn't seem to have much OS experience generally.
>>> > >
>>> >
>>> > +1
>>> >
>>> > It is to be expected from Dell as they outsource support to non-"equal
>>> > opportunity" employers who do not hire support agents beond 40 years
>>> > of age (per HR).
>>> >
>>> > Above fact. below imho
>>> >
>>> > Now, experience often helps reach the source of the problem much
>>> > faster that fast-talking street-smart agents who proliferated.
>>> >
>>> > It is sad that IT industry treats its early community members so callously.
>>> >
>>> > I don't know but Dell seems to be headed the Sun way -- open for
>>> > takeover by HP/IBM
>>> >
>>> > Above imho.
>>> >
>>> > Regards,
>>> >
>>> > Rajagopal
>>> > _______________________________________________
>>> > CentOS mailing list
>>> > CentOS at centos.org
>>> > http://lists.centos.org/mailman/listinfo/centos
>>> _______________________________________________
>>> CentOS mailing list
>>> CentOS at centos.org
>>> http://lists.centos.org/mailman/listinfo/centos
>>
>> --
>> Dr. Ed Morbius
>> Chief Scientist
>> Krell Power Systems Unlimited
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>>
>