[CentOS] 3ware disk failure -> hang

Fri Jan 6 23:22:02 UTC 2006
Bryan J. Smith <thebs413 at earthlink.net>

Les Mikesell <lesmikesell at gmail.com> wrote:
> I don't understand this distinction. The kernel calls the
> driver which talks to the controller. There should be a
> timeout around this and the controller's response or
> the timeout should be fielded by the driver.   How can
> it not be a driver issue unless the controller actually
> locks the PC bus (which may be the case with the
> motherboard IDE controllers - they generally won't boot
> with a bad drive either).  You don't want to hide the
status
> from the MD code - you want the md driver to kick the
device
> out when it has problems.

Correct.  But even the MD code, from what I've seen, directly
accesses the devices.  That in turn causes the kernel panic,
because it assumes the device is usable.

Maybe MD expects certain SCSI facilities, and 3Ware doesn't
provide them (and ATA can't).  Especially since the 3Ware
appears as a SCSI device.  But so do many SATA drivers
currently, and they do are _not_ full SCSI command sets. 
That could explain it.

I know the hotplug facility in kernel 2.6 is designed to
address the issue of programs or other drivers accessing a
device and expecting it to be there.  So you should involve
it for any such device.  I haven't done it personally though,
because I rely on hardware RAID.

In any case, my _original_point_ stands.

You can_not_ use 3Ware cards for hot-swap or handling failed
drives _unless_ you use its array facilities whereby an array
is still active (even if degraded) but not failed.  The
proliferation of use of 3Ware cards for software RAID because
they support hot-swap is _not_ true for anything but its
hardware RAID as an array, and must end.  I regularly help
people realize this when the software RAID support lists set
them wrong.


-- 
Bryan J. Smith     Professional, Technical Annoyance                      b.j.smith at ieee.org      http://thebs413.blogspot.com
----------------------------------------------------
*** Speed doesn't kill, difference in speed does ***