[CentOS] 3ware disk failure -> hang
Bryan J. Smith
thebs413 at earthlink.net
Fri Jan 6 21:13:15 UTC 2006
David Thompson <thomas at cs.wisc.edu> wrote:
> <smile>As much as I hate to agree with Bryan </smile>,
> this has been our experience also.
Oh boy, now you're a "marked man" with regards to others.
;->
> We have many TB of disk running with 3ware controllers. We
> used to use software RAID, because at that time we found
> 3ware's tools to notify us of disk/array problems unusable.
Hmmm, which ones/kernel combinations?
You must also be sure to match your firmware + driver + 3DM
version. That's pretty easy with the 7000/8000 series,
because the kernels have had the latest 3w-xxxx driver for a
good 18+ months now (maybe close to 2 years).
That's the only issue I've ever seen -- people using
different kernel driver versions to their firmware and/or
user-space software. While that administration headache
plagues every hardware RAID card, but it is easily trackable.
Now there was a change with the 2.6 kernel IOCTL that no
longer works with some of the older 3DM releases. But the
newer 3DM2 for the 9000 series works with the older 7000/8000
series, no issues. That's what I use today.
I've had to have 1 byte go due to a 3Ware card in almost 6
years of deployments, although I did have 2 ATA disks fail
within the span of 18 hours (in the middle of a rebuild).
Fortunately I was able to "knock" one of the ATA drives to
get it to spindle and then finish the rebuild of one drive.
> During that time, we could always tell when a disk failed,
> because we would have a crashed server.
Because you have to use something else to "trap" the failed
disk.
The MD suite and kernel drivers do _not_. There is the
continuing farce that the 3Ware in JBOD mode does, as well as
allows hot-swap. This is very _false_. It has lead to
repeat complaints about 3Ware cards, from the _software_ RAID
standpoint. That's because from a software RAID standpoint,
the 3Ware card offers _nothing_ over a "regular" ATA or SCSI
card. ;->
> The data would always be there after we rebooted, but a
> reboot was necessary. A few years ago we migrated
everything
> to 3ware hardware RAID, and now we rely on our alert
system,
> instead of our users, to tell us when a drive fails.
For any hardware RAID, you must have compatible releases:
driver + firmware + user-space
That typically means ensuring you have the same versions for
each. That's the only issue I've ever run into.
Some software RAID propoents will say that's a headache. In
6 years of 3Ware RAID deployments, I can say the piece of
mind with hardware RAID and its _total_ abstraction, is well
worth this little headahce.
--
Bryan J. Smith Professional, Technical Annoyance b.j.smith at ieee.org http://thebs413.blogspot.com
----------------------------------------------------
*** Speed doesn't kill, difference in speed does ***
More information about the CentOS
mailing list