Les Mikesell lesmikesell@gmail.com wrote:
I don't understand this distinction. The kernel calls the driver which talks to the controller. There should be a timeout around this and the controller's response or the timeout should be fielded by the driver. How can it not be a driver issue unless the controller actually locks the PC bus (which may be the case with the motherboard IDE controllers - they generally won't boot with a bad drive either). You don't want to hide the
status
from the MD code - you want the md driver to kick the
device
out when it has problems.
Correct. But even the MD code, from what I've seen, directly accesses the devices. That in turn causes the kernel panic, because it assumes the device is usable.
Maybe MD expects certain SCSI facilities, and 3Ware doesn't provide them (and ATA can't). Especially since the 3Ware appears as a SCSI device. But so do many SATA drivers currently, and they do are _not_ full SCSI command sets. That could explain it.
I know the hotplug facility in kernel 2.6 is designed to address the issue of programs or other drivers accessing a device and expecting it to be there. So you should involve it for any such device. I haven't done it personally though, because I rely on hardware RAID.
In any case, my _original_point_ stands.
You can_not_ use 3Ware cards for hot-swap or handling failed drives _unless_ you use its array facilities whereby an array is still active (even if degraded) but not failed. The proliferation of use of 3Ware cards for software RAID because they support hot-swap is _not_ true for anything but its hardware RAID as an array, and must end. I regularly help people realize this when the software RAID support lists set them wrong.