I've been having no end of issues with a 3ware 9650SE-24M8 in a server that's coming on a year old. I've got 24 WDC WD5001ABYS drives (500GB) hooked to it, running as a single RAID6 w/ a hot spare. These issues boil down to the card periodically throwing errors like the following:
sd 1:0:0:0: WARNING: (0x06:0x002C): Command (0x8a) timed out, resetting card.
Usually when this happens, it's followed by:
3w-9xxx: scsi1: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=0.
On the less pleasant occasions, it's followed by:
scsi1: ERROR: (0x06:0x0036): Response queue (large) empty failed during reset sequence. 3w-9xxx: scsi1: ERROR: (0x06:0x002B): Controller reset failed during scsi host reset. sd 1:0:0:0: scsi: Device offlined - not ready after error recovery
This of course leads to a several hour downtime as the system has to be powered down (not just rebooted) and then the volume needs to be fscked. I've been back and forth with both the vendor and (via the vendor) 3ware with this. The card has been replaced, as well as the whole system. I'm running the latest firmware and drivers from 3ware.
Have other folks had good luck with this card? What sorts of configs are you running? I'm in the position of needing more storage, and I'm a bit gun shy on 3ware at the moment...
Joshua Baker-LePain wrote:
I've been having no end of issues with a 3ware 9650SE-24M8 in a server that's coming on a year old. I've got 24 WDC WD5001ABYS drives (500GB) hooked to it, running as a single RAID6 w/ a hot spare. These issues boil down to the card periodically throwing errors like the following: .... Have other folks had good luck with this card? What sorts of configs are you running? I'm in the position of needing more storage, and I'm a bit gun shy on 3ware at the moment...
I have no experience with that raid card, most of our larger systems use external SAN storage, but I will say that, IMHO, is a very large raid-6. we usually don't make single raid sets much large than 7-8 drives, and for a very large storage system, will stripe multiple raid5/6 sets rather than have one huge one.
On Sat, 21 Jun 2008 at 9:12pm, John R Pierce wrote
Joshua Baker-LePain wrote:
I've been having no end of issues with a 3ware 9650SE-24M8 in a server that's coming on a year old. I've got 24 WDC WD5001ABYS drives (500GB) hooked to it, running as a single RAID6 w/ a hot spare. These issues boil down to the card periodically throwing errors like the following: .... Have other folks had good luck with this card? What sorts of configs are you running? I'm in the position of needing more storage, and I'm a bit gun shy on 3ware at the moment...
I have no experience with that raid card, most of our larger systems use external SAN storage, but I will say that, IMHO, is a very large raid-6. we usually don't make single raid sets much large than 7-8 drives, and for a very large storage system, will stripe multiple raid5/6 sets rather than have one huge one.
Would that I had such luxuries. This is a university lab with needs for massive amounts of data and not much money with which to do it.
Joshua Baker-LePain wrote:
On Sat, 21 Jun 2008 at 9:12pm, John R Pierce wrote
I have no experience with that raid card, most of our larger systems use external SAN storage, but I will say that, IMHO, is a very large raid-6. we usually don't make single raid sets much large than 7-8 drives, and for a very large storage system, will stripe multiple raid5/6 sets rather than have one huge one.
Would that I had such luxuries. This is a university lab with needs for massive amounts of data and not much money with which to do it.
Wouldn't striping a bunch of raid6 volumes give you about the same amount of space?
Russ
On Sun, 22 Jun 2008 at 1:01am, Ruslan Sivak wrote
Joshua Baker-LePain wrote:
On Sat, 21 Jun 2008 at 9:12pm, John R Pierce wrote
I have no experience with that raid card, most of our larger systems use external SAN storage, but I will say that, IMHO, is a very large raid-6. we usually don't make single raid sets much large than 7-8 drives, and for a very large storage system, will stripe multiple raid5/6 sets rather than have one huge one.
Would that I had such luxuries. This is a university lab with needs for massive amounts of data and not much money with which to do it.
Wouldn't striping a bunch of raid6 volumes give you about the same amount of space?
No. We have 24 drives. Use one for a hot spare -> leaves 23.
1 array: 23 drives, - 2 for parity -> capacity = 21 * drive capacity 2 arrays: array1 = 12 drives - 2 for parity -> 10 drives array2 = 11 drives - 2 for parity -> 9 drives -> capcity = 19 * drive capcity 3 arrays: array1 = 8 drives - 2 for parity -> 6 drives array2 = 8 drives - 2 for parity -> 6 drives array3 = 7 drives - 2 for parity -> 5 drives -> capcity = 17 * drive capacity
With 1TB drives, you're losing 2TB worth of volume space for each increased number of arrays. That's a lot of space.
Unless I misunderstood you...
Joshua Baker-LePain wrote:
periodically throwing errors like the following:
sd 1:0:0:0: WARNING: (0x06:0x002C): Command (0x8a) timed out, resetting card.
Wondering if you have scheduled automatic media scans of all of the disks in the array? Perhaps you have a disk that is going bad causing the issue.
Something else that could be related, I was told by someone who had a Isilon storage system(fancy NAS box), who was having his WD disk drives hang on him on occasion, when this occured he had to physically remove the disk from the system and re plug it in. It was a firmware issue, I don't recall which WD drives he had, he eventually got a fixed firmware though. This was about a year ago.
I have media scans run once a week for about 7 hours on my 2 disk 3Ware systems (8006-2 controllers). For a 24 disk system you'll probably need to run it longer. (unless the newer controllers scan in parallel, the 8000 series seems to be serial).
I ran a couple 9650 series cards not too long ago, I think they were just two disk systems running RAID 1 (up to 8 disks, but only used 2). I've been using 3ware cards for about 8 years now and have not run into those types of errors you describe. Probably ran about 350 cards over the years, most of them in the 8000 series.
nate
On Sat, Jun 21, 2008 at 11:04 PM, Joshua Baker-LePain jlb17@duke.edu wrote:
I've been having no end of issues with a 3ware 9650SE-24M8 in a server that's coming on a year old. I've got 24 WDC WD5001ABYS drives (500GB) hooked to it, running as a single RAID6 w/ a hot spare. These issues boil down to the card periodically throwing errors like the following:
sd 1:0:0:0: WARNING: (0x06:0x002C): Command (0x8a) timed out, resetting card.
Usually when this happens, it's followed by:
3w-9xxx: scsi1: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=0.
On the less pleasant occasions, it's followed by:
scsi1: ERROR: (0x06:0x0036): Response queue (large) empty failed during reset sequence. 3w-9xxx: scsi1: ERROR: (0x06:0x002B): Controller reset failed during scsi host reset. sd 1:0:0:0: scsi: Device offlined - not ready after error recovery
This of course leads to a several hour downtime as the system has to be powered down (not just rebooted) and then the volume needs to be fscked. I've been back and forth with both the vendor and (via the vendor) 3ware with this. The card has been replaced, as well as the whole system. I'm running the latest firmware and drivers from 3ware.
Have other folks had good luck with this card? What sorts of configs are you running? I'm in the position of needing more storage, and I'm a bit gun shy on 3ware at the moment...
This may be completely irrelevant, but we have a 9550 card running RAID 5 with a 'prominent non-Linux' operating system that suffers from the same symptoms (and 4 others that have never done it). We've heard from our vendor (and 3ware) that there are some upcoming firmware releases (looks like August) that might help. A 3ware tech told me that the controller reset happens when communication between the driver and the firmware times out, which appears to be exactly what is in your error message.
Meanwhile, we just cross our fingers and thank our lucky stars the the server in question is in our local office and not one of our non-tech-staffed remote offices. There are unsupported pre-release firmware downloads available if you like to gamble. I have not had the courage to install the beta firmware on our servers. I have not used 3ware with CentOS, but I don't think this is a CentOS issue.
Have other folks had good luck with this card? What sorts of configs are you running? I'm in the position of needing more storage, and I'm a bit gun shy on 3ware at the moment...
Does that drive have a jumper to slow it down to 1.5Gb transfer rate? Cheap controllers and drives just cant do it, I have had no end of issues even with *all* my LSI controllers until I jumped all my sata drives down.
As far as performance, it made no impact on my systems.
jlc
on 6-21-2008 9:04 PM Joshua Baker-LePain spake the following:
I've been having no end of issues with a 3ware 9650SE-24M8 in a server that's coming on a year old. I've got 24 WDC WD5001ABYS drives (500GB) hooked to it, running as a single RAID6 w/ a hot spare. These issues boil down to the card periodically throwing errors like the following:
sd 1:0:0:0: WARNING: (0x06:0x002C): Command (0x8a) timed out, resetting card.
Usually when this happens, it's followed by:
3w-9xxx: scsi1: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=0.
On the less pleasant occasions, it's followed by:
scsi1: ERROR: (0x06:0x0036): Response queue (large) empty failed during reset sequence. 3w-9xxx: scsi1: ERROR: (0x06:0x002B): Controller reset failed during scsi host reset. sd 1:0:0:0: scsi: Device offlined - not ready after error recovery
This of course leads to a several hour downtime as the system has to be powered down (not just rebooted) and then the volume needs to be fscked. I've been back and forth with both the vendor and (via the vendor) 3ware with this. The card has been replaced, as well as the whole system. I'm running the latest firmware and drivers from 3ware.
Have other folks had good luck with this card? What sorts of configs are you running? I'm in the position of needing more storage, and I'm a bit gun shy on 3ware at the moment...
That looks like either drive, cabling, or power problems.
On Sun, 22 Jun 2008 at 10:23am, Scott Silva wrote
on 6-21-2008 9:04 PM Joshua Baker-LePain spake the following:
This of course leads to a several hour downtime as the system has to be powered down (not just rebooted) and then the volume needs to be fscked. I've been back and forth with both the vendor and (via the vendor) 3ware with this. The card has been replaced, as well as the whole system. I'm running the latest firmware and drivers from 3ware.
That looks like either drive, cabling, or power problems.
I'd agree, except for a) all the hardware has been swapped out and b) 1500W should be plenty.
It's starting to sound like this may be a somewhat known issue with a *long* overdue fix coming from 3ware. *sigh*
Thanks all.
1500W should be plenty, but the card may not be getting enough power.
On a much smaller system (3 drives, 1 3ware card), I had power problems. I used a 400W power supply and the +-5V rail was only delivering 3.9V. I kept losing drives. This was an 'expensive' Antec power supply.
I switched to a budget 300W power supply just to see what would happen. The unit delivered a much cleaner ~4.8V. It's worked great ever since.
On Mon, Jun 23, 2008 at 7:45 AM, Joshua Baker-LePain jlb17@duke.edu wrote:
On Sun, 22 Jun 2008 at 10:23am, Scott Silva wrote
on 6-21-2008 9:04 PM Joshua Baker-LePain spake the following:
This of course leads to a several hour downtime as the system has to be powered down (not just rebooted) and then the volume needs to be fscked. I've been back and forth with both the vendor and (via the vendor) 3ware with this. The card has been replaced, as well as the whole system. I'm running the latest firmware and drivers from 3ware.
That looks like either drive, cabling, or power problems.
I'd agree, except for a) all the hardware has been swapped out and b) 1500W should be plenty.
It's starting to sound like this may be a somewhat known issue with a *long* overdue fix coming from 3ware. *sigh*
Thanks all.
-- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Sunday 22 June 2008 12:04:47 am Joshua Baker-LePain wrote:
I've been having no end of issues with a 3ware 9650SE-24M8 in a server that's coming on a year old. I've got 24 WDC WD5001ABYS drives (500GB) hooked to it, running as a single RAID6 w/ a hot spare.
What size power supply do you have in your server?
Peter.
On Sun, 22 Jun 2008 at 1:37pm, Peter Arremann wrote
On Sunday 22 June 2008 12:04:47 am Joshua Baker-LePain wrote:
I've been having no end of issues with a 3ware 9650SE-24M8 in a server that's coming on a year old. I've got 24 WDC WD5001ABYS drives (500GB) hooked to it, running as a single RAID6 w/ a hot spare.
What size power supply do you have in your server?
1500W.
Joshua Baker-LePain wrote:
I've been having no end of issues with a 3ware 9650SE-24M8 in a server that's coming on a year old. I've got 24 WDC WD5001ABYS drives (500GB) hooked to it, running as a single RAID6 w/ a hot spare. These issues boil down to the card periodically throwing errors like the following:
sd 1:0:0:0: WARNING: (0x06:0x002C): Command (0x8a) timed out, resetting card.
9650SE with 8 ports on a couple servers running CentOS 5 64 bit. Pretty heavily used database servers, lots of bursts of disk activity. No problems so far. I'm using the binary driver provided by 3ware.
I'm testing now the 16 port version for newer servers, no problems there either (but not much usage yet, of course).