[CentOS] 3ware disk failure -> hang

Fri Jan 6 21:13:15 UTC 2006
Bryan J. Smith <thebs413 at earthlink.net>

David Thompson <thomas at cs.wisc.edu> wrote:
> <smile>As much as I hate to agree with Bryan </smile>,
> this has been our experience also.

Oh boy, now you're a "marked man" with regards to others. 
;->

> We have many TB of disk running with 3ware controllers.  We

> used to use software RAID, because at that time we found
> 3ware's tools to notify us of disk/array problems unusable.

Hmmm, which ones/kernel combinations?

You must also be sure to match your firmware + driver + 3DM
version.  That's pretty easy with the 7000/8000 series,
because the kernels have had the latest 3w-xxxx driver for a
good 18+ months now (maybe close to 2 years).

That's the only issue I've ever seen -- people using
different kernel driver versions to their firmware and/or
user-space software.  While that administration headache
plagues every hardware RAID card, but it is easily trackable.

Now there was a change with the 2.6 kernel IOCTL that no
longer works with some of the older 3DM releases.  But the
newer 3DM2 for the 9000 series works with the older 7000/8000
series, no issues.  That's what I use today.

I've had to have 1 byte go due to a 3Ware card in almost 6
years of deployments, although I did have 2 ATA disks fail
within the span of 18 hours (in the middle of a rebuild). 
Fortunately I was able to "knock" one of the ATA drives to
get it to spindle and then finish the rebuild of one drive.

> During that time, we could always tell when a disk failed,
> because we would have a crashed server.

Because you have to use something else to "trap" the failed
disk.

The MD suite and kernel drivers do _not_.  There is the
continuing farce that the 3Ware in JBOD mode does, as well as
allows hot-swap.  This is very _false_.  It has lead to
repeat complaints about 3Ware cards, from the _software_ RAID
standpoint.  That's because from a software RAID standpoint,
the 3Ware card offers _nothing_ over a "regular" ATA or SCSI
card.  ;->

> The data would always be there after we rebooted, but a
> reboot was necessary.  A few years ago we migrated
everything
> to 3ware hardware RAID, and now we rely on our alert
system,
> instead of our users, to tell us when a drive fails.

For any hardware RAID, you must have compatible releases:  
  driver + firmware + user-space  

That typically means ensuring you have the same versions for
each.  That's the only issue I've ever run into.

Some software RAID propoents will say that's a headache.  In
6 years of 3Ware RAID deployments, I can say the piece of
mind with hardware RAID and its _total_ abstraction, is well
worth this little headahce.


-- 
Bryan J. Smith     Professional, Technical Annoyance                      b.j.smith at ieee.org      http://thebs413.blogspot.com
----------------------------------------------------
*** Speed doesn't kill, difference in speed does ***