[CentOS] 3ware 9650 issues

Sun Jun 22 05:50:49 UTC 2008
Jeff <jlar310 at gmail.com>

On Sat, Jun 21, 2008 at 11:04 PM, Joshua Baker-LePain <jlb17 at duke.edu> wrote:
> I've been having no end of issues with a 3ware 9650SE-24M8 in a server
> that's coming on a year old.  I've got 24 WDC WD5001ABYS drives (500GB)
> hooked to it, running as a single RAID6 w/ a hot spare.  These issues boil
> down to the card periodically throwing errors like the following:
>
> sd 1:0:0:0: WARNING: (0x06:0x002C): Command (0x8a) timed out, resetting
> card.
>
> Usually when this happens, it's followed by:
>
> 3w-9xxx: scsi1: AEN: INFO (0x04:0x005E): Cache synchronization
> completed:unit=0.
>
> On the less pleasant occasions, it's followed by:
>
> scsi1: ERROR: (0x06:0x0036): Response queue (large) empty failed during
> reset sequence.
> 3w-9xxx: scsi1: ERROR: (0x06:0x002B): Controller reset failed during scsi
> host reset.
> sd 1:0:0:0: scsi: Device offlined - not ready after error recovery
>
> This of course leads to a several hour downtime as the system has to be
> powered down (not just rebooted) and then the volume needs to be fscked.
> I've been back and forth with both the vendor and (via the vendor) 3ware
> with this.  The card has been replaced, as well as the whole system.  I'm
> running the latest firmware and drivers from 3ware.
>
> Have other folks had good luck with this card?  What sorts of configs are
> you running?  I'm in the position of needing more storage, and I'm a bit gun
> shy on 3ware at the moment...

This may be completely irrelevant, but we have a 9550 card running
RAID 5 with a 'prominent non-Linux' operating system that suffers from
the same symptoms (and 4 others that have never done it). We've heard
from our vendor (and 3ware) that there are some upcoming firmware
releases (looks like August) that might help. A 3ware tech told me
that the controller reset happens when communication between the
driver and the firmware times out, which appears to be exactly what is
in your error message.

Meanwhile, we just cross our fingers and thank our lucky stars the the
server in question is in our local office and not one of our
non-tech-staffed remote offices. There are unsupported pre-release
firmware downloads available if you like to gamble. I have not had the
courage to install the beta firmware on our servers. I have not used
3ware with CentOS, but I don't think this is a CentOS issue.

-- 
Jeff