[CentOS] 3ware disk failure -> hang

Fri Jan 6 23:56:48 UTC 2006
Les Mikesell <lesmikesell at gmail.com>

On Fri, 2006-01-06 at 17:34, Bryan J. Smith wrote:

> For the most part, knowing _how_ to deploy hardware or
> software RAID is the critical factor.  If you use 3Ware, use
> its facilities.  The biggest falicy I see propogated is that
> you can use its hot-swap and fault-tolerance with software
> RAID.  You can't any more than any other ATA or SCSI card
> I've used (although I have to investigate some of the SCSI
> cards people are using here).

I have a non-critical IBM eserver with software raid
running so I yanked a drive to see what happens.
Basically nothing.  All the other drive lights blinked
while it reset the bus, it logged some scsi errors like:
SCSI error : <0 0 2 0> return code = 0x10000
end_request: I/O error, dev sdc, sector 71681855
md: write_disk_sb failed for device sdc1
then:
md: write_disk_sb failed for device sdc1
md: excessive errors occurred during superblock update, exiting
mptbase: ioc0: IOCStatus(0x0043): SCSI Device Not There
SCSI error : <0 0 2 0> return code = 0x10000
end_request: I/O error, dev sdc, sector 12791
raid1: Disk failure on sdc1, disabling device.
        Operation continuing on 1 devices

Everything is still working normally.

Then I removed the failed device from the raid, did the
echo 'remove-single-device ..." >/proc/scsi/scsi
thing, reseated the drive, added it back
as a scsi device and added it back to the raid and
it is rebuilding now. Nothing else even blinked
except the first 'cat /proc/mdstat' took several
seconds after the disk was removed.

-- 
  Les Mikesell
   lesmikesell at gmail.com