[CentOS] RAID-1 mdadm software raid and kernel crashing on hdd failure

Sat Mar 27 18:56:06 UTC 2010
Robert Heller <heller at deepsoft.com>

At Sat, 27 Mar 2010 11:01:08 -0700 CentOS mailing list <centos at centos.org> wrote:

> 
> Pasi Kärkkäinen wrote:
> > On Fri, Mar 26, 2010 at 11:14:18AM -0700, Benjamin Franz wrote:
> >   
> >> Yup. 8 way RAID1 for the OS, 8 way RAID6 for the data. I was hoping when 
> >> I setup the 8-way RAID1 for the OS that I would get really good read 
> >> speeds since md is supposed to stripe reads from RAID1, but in practice 
> >> the RAID6 completely kills it for read performance (~61 MB/sec from the 
> >> RAID1 partition vs ~200 MB/sec from the RAID6 partition).
> >>
> >> In a deeply ironic turn of events, one of the hard drives in that 
> >> machine died in a way that freaked the hardware controller driver out 
> >> and caused a kernel panic last week.
> >>
> >>     
> >
> > I've also seen CentOS 5.3 (or 5.4, not sure) crash when a single sata hdd failed.
> > The system was running mdadm RAID-1 mirror, so it shouldn't have been fatal event..
> >
> > There was kernel oops on the console. too bad I didn't have time to capture it then.
> > System was running AHCI SATA on Intel ICH9 controller, with mdadm software raid.
> >
> > So there's still need for hardware RAID controllers..
> >   
> 
> I'm not sure that is a good conclusion. The controller *is* a (3ware) 
> hardware RAID controller - but the drive failure caused the 3ware driver 
> to crash. That I wasn't using the controller in HW RAID mode may not be 
> a good indicator that all would have been well if I had been.

There might be drive failure modes that put the drive controller into
an 'odd' state.  Still should not cause the software driver to crash --
I'd consider that a software bug. I've never heard of a SCSI HBA driver
crashing. I might have encountered drive failure modes that would do
things like hang the SCSI bus or otherwise confuse the SCSI HBA (eg
causing it to fail to see *other* drives/devices).  Since there is no
SATA 'bus' (SATA drives are connected point-to-point 'star' fashion), a
failed drive should not take the controller out, but I guess it depends
on the signaling logic and what sort of logic gates are used for 'drive
select'. Of course, a controller like the 3ware hardware RAID
controller, configured in JOBD mode, probably looks like a SCSI HBA
with bunch of disks on a single SCSI bus.  

> 

-- 
Robert Heller             -- 978-544-6933
Deepwoods Software        -- Download the Model Railroad System
http://www.deepsoft.com/  -- Binaries for Linux and MS-Windows
heller at deepsoft.com       -- http://www.deepsoft.com/ModelRailroadSystem/