[CentOS] 3ware disk failure -> hang

Fri Jan 6 20:56:23 UTC 2006
Bryan J. Smith <thebs413 at earthlink.net>

Les Mikesell <lesmikesell at gmail.com> wrote:
> OK, but "raw" scsi disks don't have this problem.

Huh?  Unless the device is unmounted and not in use, you
betcha you'll have the _same_ problem.  The kernel panics
because the device is no longer available.

Only when you have a SCSI hardware RAID array will you get
the same functionality as 3Ware hardware RAID arrays.

> Why is this different than a scsi drive?

It's not.

> Of course we don't understand it.
> Is this documented somewhere?

Sigh.  Please show me where it is documented that you can
remove _any_, _active_ storage device from a system without
it kernel panicing?  The only time I can remove _any_ storage
device (without configuring advanced hotplug features) is if
I take the device off-line.

That's just how the kernel works, _period_.

3Ware physically "hides" the storage devices, but _only_ when
you make them an array.  As long as the array is intact (be
it good or degraded), it is still usable by the OS.  The
3Ware is controlling _all_ disc activies, and only reports
itself as an array back to the OS.

When the OS sees the "raw" storage, then that's a problem if
one part of the storage becomes in available.  Such is the
case of _any_ storage that is "removed" because it fails --
be it a physical ATA drive on an ATA controller, a physical
SCSI drive on a SCSI controller, or any controller that
presents a disk as a standalone JBOD volume.

You have to setup something like hotplug to take control of
the device, so when it goes off-line, the system doesn't see
it, while it's still trying to use it like it's there.


-- 
Bryan J. Smith     Professional, Technical Annoyance                      b.j.smith at ieee.org      http://thebs413.blogspot.com
----------------------------------------------------
*** Speed doesn't kill, difference in speed does ***