[CentOS] 3ware disk failure -> hang

On Fri, 2006-01-06 at 16:24, Bryan J. Smith wrote:

> > Or at least the typical hardware/driver errors aren't
> > fatal.
> 
> I think you, and most software RAID users, continue to miss
> the _root_ cause.  If you yank a drive out of a system, one
> that is being used _actively_, you are going to get a kernel
> panic.  I've seen it on ATA and SCSI.  It's _not_ a driver
> issue.  It's the fact that you've lost a resource.
> 
> The MD code does _not_ handle this.  You have to tie into the
> hotplug system for 2.6 to hide the device's status from the
> MD code.
> 
> Now maybe some SCSI drivers handle it differently.  But it is
> _not_ a driver issue.

I don't understand this distinction. The kernel calls the
driver which talks to the controller. There should be a
timeout around this and the controller's response or
the timeout should be fielded by the driver.   How can
it not be a driver issue unless the controller actually
locks the PC bus (which may be the case with the motherboard
IDE controllers - they generally won't boot with a bad drive
either).  You don't want to hide the status from the MD code - you
want the md driver to kick the device out when it has problems.

-- 
  Les Mikesell
   lesmikesell at gmail.com