[CentOS] 3ware disk failure -> hang

Fri Jan 6 22:24:33 UTC 2006
Bryan J. Smith <thebs413 at earthlink.net>

Les Mikesell <lesmikesell at gmail.com> wrote:
> Or at least the typical hardware/driver errors aren't
> fatal.

I think you, and most software RAID users, continue to miss
the _root_ cause.  If you yank a drive out of a system, one
that is being used _actively_, you are going to get a kernel
panic.  I've seen it on ATA and SCSI.  It's _not_ a driver
issue.  It's the fact that you've lost a resource.

The MD code does _not_ handle this.  You have to tie into the
hotplug system for 2.6 to hide the device's status from the
MD code.

Now maybe some SCSI drivers handle it differently.  But it is
_not_ a driver issue.

I can take down 3Ware arrays or JBODs and do it all-the-time.
 The key difference is that I'm _not_actively_ using the
arrays/JBODs.  You're getting the kernel panic because you
_are_.

If you are actively using a device, it will tank the kernel
if it suddenly becomes unavailable.  I have _never_ seen MD
handle this correctly, and some SCSI cards must just be more
graceful.

Again, _regardless_ of how some SCSI cards might work, with
SCSI, ATA and other cards I've used, unless I use hotplug's
facilities (one of the reasons why many SCSI drivers were
deprecated for 2.6), it does _not_ work.

And you will _not_ get such operation out of a 3Ware card in
JBOD mode, _only_ when you use its hardware arrays.

> 
> > In any case, 3Ware cards do _not_ do it for JBOD.
> 
> I'm sure you are right about the behavior but it still
> seems surprising that the driver for what appears to be
> hot-swap devices actually isn't.
> 
> -- 
>   Les Mikesell
>    lesmikesell at gmail.com
> 
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 
> 


-- 
Bryan J. Smith     Professional, Technical Annoyance                      b.j.smith at ieee.org      http://thebs413.blogspot.com
----------------------------------------------------
*** Speed doesn't kill, difference in speed does ***