[CentOS] 3ware disk failure -> hang
Bryan J. Smith
thebs413 at earthlink.net
Fri Jan 6 17:39:38 UTC 2006
Joshua Baker-LePain <jlb17 at duke.edu> wrote:
> I'm running an all software RAID50 ...
> This morning I came in to find the system hung.
> Turns out a disk went overnight on one of the 7500s,
> and rather than a graceful failover I got this:
> Jan 6 01:03:58 $SERVER kernel: 3w-xxxx: scsi2: Command
> failed: status = 0xc7,flags = 0x40, unit #3.
> Jan 6 01:04:02 $SERVER kernel: 3w-xxxx: scsi2: AEN: ERROR:
> Drive error: Port #3.
> Jan 6 01:04:10 $SERVER 3w-xxxx[2781]: ERROR: Drive error
> encountered on port 3 on controller ID:2. Check cables and
> drives for media errors. (0xa)
Yes, the drive failed.
Had you used the 3Ware's intelligent hardware RAID, it would
have hidden the drive disconnect from the system. You'd see
a log entry on the failure, and that the array was in a
"downgraded" state.
Instead, you're using software RAID, and it's up to the
kernel to not panic on itself because a disk is no longer
available. The problem isn't the 3Ware controller, it's the
software RAID logic in the kernel.
> Any ideas as to what I can do to prevent this in the
> future?
Use the 3Ware card as it is intended, a hardware RAID card.
> Having the system hang every time a disk dies is, well,
less
> than optimal.
No joke. It wasn't until even kernel 2.6 that hotplug
support was offered, and it still does _not_ work as
advertised.
It's stuff like this that makes me want to strangle most
advocates of using 3Ware cards with software RAID. There are
countless issues like this -- far more than the alleged
"hardware lock-in" negative of using hardware RAID.
--
Bryan J. Smith Professional, Technical Annoyance b.j.smith at ieee.org http://thebs413.blogspot.com
----------------------------------------------------
*** Speed doesn't kill, difference in speed does ***
More information about the CentOS
mailing list