[CentOS] CentOS and Dell MD3200i / MD3220i iSCSI w/ multipath -- slightly OT

Tue Jan 25 04:37:04 UTC 2011
Dr. Ed Morbius <dredmorbius at gmail.com>

on 07:48 Sun 23 Jan, Peter Gillich (pgillich at gmail.com) wrote:
> Hi,
> In last summer, I have had same problems with Dell + CentOS +
> multipath combination. For example I/O errors and stability problems
> on the initiator machines. The initator machines are (in a Pacemaker
> cluster):
> - Dell R310
> - Broadcom 5709 Gigabit Ethernet card (4-port)
> - CentOS 5.4
> - 2 Ethernet ports on initiator machines, 2 Ethernet ports in target
> machines --> 4 iSCSI pathes by initiators
> 
> Irrespectively of iSCSI, we met the Broadcom MSI-X interrupt problem
> (corrected in RHEL/CentOS 5.5). We met more (iSCSI) problems with
> Broadcom cards, which are described on a Dell support page:
> http://support.dell.com/support/edocs/software/rhel_mn/rhel5_4/en/index.htm

Not familiar with this, though we're using Broadcom NICs, four per host
for the most part:

    01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
            Subsystem: Dell PowerEdge R610 BCM5709 Gigabit Ethernet
            Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
            Latency: 0, Cache Line Size: 64 bytes
            Interrupt: pin A routed to IRQ 98
            Region 0: Memory at d6000000 (64-bit, non-prefetchable) [size=32M]
            Capabilities: [48] Power Management version 3
                    Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                    Status: D0 PME-Enable- DSel=0 DScale=1 PME-
            Capabilities: [50] Vital Product Data

We're bonding two NICs together on each of our core and management nets,
iSCSI traffic is on the management net.

(VMs are set to use E1000, single interface per subnet).
 
> Since the CentOS is a recompiled RedHat, all RHEL problems and
> solutions are true for CentOS ;-)
> The Broadcom driver source code is frequently changed. RedHat follows
> the Broadcom kernel drivers and iscsi-initiator-utils with some months
> latency. CentOS follows the RedHat with some days/weeks/monts.
> 
> Maybe you can find a solution for your problem on a newer Dell support
> page: http://support.dell.com/support/edocs/software/rhel_mn/rhel5_5/en/index.htm
> Or here:
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/DM_Multipath
> http://opensource.marshall.edu/papers/rhel5-iscsi-HOWTO.pdf

That Marshall.edu doc looks pretty good.  I'll note that if you're
expecting to mount your network devices at boot, having the netdev
service running will help (we ran into this issue, repeatedly, thanks to
a puppet config ;-).

> Some tips:
> - I've read somewhere about iSCSI multipath I/O errors, which can be a
> normal behaving in a multipath environment at boot time. (?)

That has been our experience to date.

> - Persistent reservation might be usefult against iSCSI multipath I/O errors.

What's persistent reservation?

> - Disabling iSCSI offload feature (for example: iSCSI over Broadcom )
> and TCP offload feature (for example: NFS over Intel) may be helps.

How does one do this / check for this?

> - The iSCSI kernel drivers and iscs-initiator-utils must be updated together.

We'll keep this in mind.

> 
> Finally, some comments:
> - Never use Broadcom GbE card. Intel might be better (mostly)

I think we're stuck with 'em.  Dell seems to have been shipping with
Broadcom for some years.  Early experiences were horrible, lately it's
been getting better, but I'm still leary of the brand.

> - The Dell is hardware manufacturer (supplier), not an
> OS/driver/utility developer. If you would like to get more support,
> you may buy RHEL licenses (with the Dell hardware or from RedHat).
> Sometime it's cheaper than taking days for a problem (but sometime
> not).

Yeah.  We've got a single RH license at this point at it does let us
into RH's knowledgebase, though there hasn't been a whole lot there
either.

> - IBM compiles the latest Broadcom driver if required, see:
> http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5073130
> - Some Dell hardwares have only x86_64 RedHat certifications See:
> https://hardware.redhat.com/show.cgi?id=632145 (R310 + RHEL 6)
> 
> BR,
> 
> Peter

Thanks, Peter, very helpful.
 
> On Sat, Jan 22, 2011 at 18:36, Rajagopal Swaminathan
> <raju.rajsand at gmail.com> wrote:
> >
> > Greetings,
> >
> > On 1/22/11, Edward Morbius <dredmorbius at gmail.com> wrote:
> > > CentOS is not a Dell-supported configuration, and we've had little helpful
> > > advice from Dell.  There's been some amount of FUD in that Dell don't seem
> > > to know what Dell's own software installation (the md3
> > >
> > > Dell doesn't seem to have much OS experience generally.
> > >
> >
> > +1
> >
> > It is to be expected from Dell as they outsource support to non-"equal
> > opportunity" employers who do not hire support agents beond 40 years
> > of age (per HR).
> >
> > Above fact. below imho
> >
> > Now, experience often helps reach the source of the problem much
> > faster that fast-talking street-smart agents who proliferated.
> >
> > It is sad that IT industry treats its early community members so callously.
> >
> > I don't know but Dell seems to be headed the Sun way -- open for
> > takeover by HP/IBM
> >
> > Above imho.
> >
> > Regards,
> >
> > Rajagopal
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org
> > http://lists.centos.org/mailman/listinfo/centos
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos

-- 
Dr. Ed Morbius
Chief Scientist
Krell Power Systems Unlimited