On Sat, Jun 21, 2008 at 11:04 PM, Joshua Baker-LePain jlb17@duke.edu wrote:
I've been having no end of issues with a 3ware 9650SE-24M8 in a server that's coming on a year old. I've got 24 WDC WD5001ABYS drives (500GB) hooked to it, running as a single RAID6 w/ a hot spare. These issues boil down to the card periodically throwing errors like the following:
sd 1:0:0:0: WARNING: (0x06:0x002C): Command (0x8a) timed out, resetting card.
Usually when this happens, it's followed by:
3w-9xxx: scsi1: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=0.
On the less pleasant occasions, it's followed by:
scsi1: ERROR: (0x06:0x0036): Response queue (large) empty failed during reset sequence. 3w-9xxx: scsi1: ERROR: (0x06:0x002B): Controller reset failed during scsi host reset. sd 1:0:0:0: scsi: Device offlined - not ready after error recovery
This of course leads to a several hour downtime as the system has to be powered down (not just rebooted) and then the volume needs to be fscked. I've been back and forth with both the vendor and (via the vendor) 3ware with this. The card has been replaced, as well as the whole system. I'm running the latest firmware and drivers from 3ware.
Have other folks had good luck with this card? What sorts of configs are you running? I'm in the position of needing more storage, and I'm a bit gun shy on 3ware at the moment...
This may be completely irrelevant, but we have a 9550 card running RAID 5 with a 'prominent non-Linux' operating system that suffers from the same symptoms (and 4 others that have never done it). We've heard from our vendor (and 3ware) that there are some upcoming firmware releases (looks like August) that might help. A 3ware tech told me that the controller reset happens when communication between the driver and the firmware times out, which appears to be exactly what is in your error message.
Meanwhile, we just cross our fingers and thank our lucky stars the the server in question is in our local office and not one of our non-tech-staffed remote offices. There are unsupported pre-release firmware downloads available if you like to gamble. I have not had the courage to install the beta firmware on our servers. I have not used 3ware with CentOS, but I don't think this is a CentOS issue.