[CentOS] 3ware disk failure -> hang -- how does software RAID "hide" a disk?

Fri Jan 6 20:29:51 UTC 2006
Bryan J. Smith <thebs413 at earthlink.net>

Joshua Baker-LePain <jlb17 at duke.edu> wrote:
> But, as the archives of this list will attest to, using
these
> boards in hardware RAID mode in centos 4 is bad news.
> Performance sucks.

At RAID-5 writes?  Of course on the 7000/8000 designs.  They
only have 1-4MB of SRAM, not enough to buffer SRAM.

Furthermore, software RAID-0 is _always_ going to be faster
than hardware RAID-0.  RAID-5 reads are basically RAID-0
reads (minus one stripe).

But at RAID-1 or RAID-10, 3Ware's 7000/8000 Storage Switch
designs are very, very fast.

> There's some sort of nasty interaction between the 3wares
and
> ext3 which makes the combo unusable, really.

Huh?  _Never_ heard of that.  I'm using 7000/8000 series
cards on RHEL3 and RHEL4 (as well as FC1-FC3), *0* issues. 
All Ext3 filesystems.

> Hotplug worked just fine on this system when I tested
> (multiple times) via 'mdadm -f -r' and 'mdadm -a'.  It's
the
> actual disk failure handling that's at fault here.

Yes, that's ... tada ... hotplug!

You can't just have a fixed disk "remove itself" from the OS.
 That's causing your panic.

When you're using 3Ware in JBOD, all it can do is report the
disk failure and report the fixed disk as unusable and remove
it from the system.  So for software RAID, it's up to the
_kernel_ to handle that right.

And sure enough, it doesn't.

Has absolutely nothing to do with 3Ware's card.  When you use
JBOD and you remove or lose a disk, which is its own volume,
the 3Ware removes the volume -- just as if a "regular" ATA or
SCSI card with a disk.

There is no way for 3Ware to "hide" the volume or continue
using it -- because there is a 1:1 disc:volume relationship. 
They only way to "hide" the disk is to use its hardware RAID
features, where multiple disks are a volume.

Until the kernel has standard, trusted features to handle
failed disks, it's the reason why I refuse to use software
RAID-1, 10 or 5.  Hotplug in 2.6 is supposed to handle this
when setup correctly, but I've yet to see it.



-- 
Bryan J. Smith     Professional, Technical Annoyance                      b.j.smith at ieee.org      http://thebs413.blogspot.com
----------------------------------------------------
*** Speed doesn't kill, difference in speed does ***