[CentOS] Re: Raid5 issues

Fri May 4 18:31:55 UTC 2007
Toby Bluhm <tkb at midwestinstruments.com>

Ruslan Sivak wrote:
> Toby Bluhm wrote:
>> Ruslan Sivak wrote:
>>> Toby Bluhm wrote:
>>>> Ruslan Sivak wrote:
>>>>> Feizhou wrote:
>>>>>> Ruslan Sivak wrote:
>>>>>>> Feizhou wrote:
>>>>>>>>> I do have a SIL3114 chipset, and I think it's supposed to be 
>>>>>>>>> supported by device mapper.  When I go to rescue mode, I see 
>>>>>>>>> it loading the driver for SIL3112, but nothing appears under 
>>>>>>>>> /dev/mapper except control.  Are there instructions somewhere 
>>>>>>>>> on getting it to use my controller's raid?
>>>>>>>> Your controller only has a bios chip. It has no raid processing 
>>>>>>>> capability at all.
>>>>>>>> You need to use mdadm. anaconda should be able to let you 
>>>>>>>> create to mirrors and then create a third array that stripes 
>>>>>>>> those md devices,
>>>>>>>> _______________________________________________
>>>>>>> Anaconda doesn't let me create a stripe raid set on top of a 
>>>>>>> mirror set.  And it doesn't detect it when I do it manually.
>>>>>>> Also the bios chip presents additional issues.  I believe when I 
>>>>>>> don't have a raid array set up, it won't boot at all.  When I 
>>>>>>> have it on raid10, I had trouble booting, and when I have it on 
>>>>>>> concatenation, everything works fine, until a drive is 
>>>>>>> replaced.  At that point, i have to recreate the array, as 
>>>>>>> concatenation is not a fault tolerant set, and at this point I 
>>>>>>> seem to lose all my data.
>>>>>> It won't boot at all without a raid array setup? That sounds 
>>>>>> really funny.
>>>>> Actually I'm not 100% sure on this, but I think this is the case.  
>>>>> I believe the first time I set it up as a raid10, assuming that 
>>>>> linux will just ignore it.  I installed centos by putting boot on 
>>>>> a raid1, and root on LVM over 2 raid1 sets.  I had trouble getting 
>>>>> it to boot.
>>>>>>> Is there a way to get it to use the raid that's part of the bios 
>>>>>>> chip?  
>>>>>> Repeat after me. There is no raid that is part of the bios chip. 
>>>>>> It is just a simple table.
>>>>> Yes, I know this is fakeraid, aka softraid, but I was hoping that 
>>>>> using the drivers would make it easier to support raid 10 then 
>>>>> with mdadm, which seems to be impossible to get to work with the 
>>>>> installer.  I'm not even sure why the raid10 personality is not 
>>>>> loaded, as it seems to have been part of the mdadm since version 1.7.
>>>>>>> Something about device mapper?
>>>>>> You need the fake raid driver dmraid if you are going to set up 
>>>>>> stuff in the bios. What version of centos are you trying to 
>>>>>> install? libata in Centos 5 should support this without having to 
>>>>>> resort to the ide drivers.
>>>>>> _________________________________
>>>>> I'm trying to install centos 5 - the latest.  How would I go about 
>>>>> using dmraid and/or libata?  The installer picks up the drives as 
>>>>> individual drives.  There is a drive on the silicon image website, 
>>>>> but it's for RHEL4, and I couldn't get it to work.  I'm open to 
>>>>> using md for raid, or even LVM, if it supports it.  I just want to 
>>>>> be able to use raid10, as I can't trust raid5 anymore.
>>>> IIRC you had two out of four new disks die? So maybe it would be 
>>>> more accurate to say it's your hardware you don't trust. Raid5 is 
>>>> used without problems by ( I assume ) many, many people, myself 
>>>> included. You could have a raid10 and still lose the whole array if 
>>>> two disks that in the same mirror die at once. I guess no software 
>>>> in the world can really overcome bad hardware. That's why we do 
>>>> backups :)
>>>> Anyway, perhaps excersizing /stressing the disks for a few days 
>>>> without error would make you feel more confident about the HDs.
>>> Actually, 2 disks did not die.  Due to the fact that it was a new 
>>> raid 5 array (or for whatever reason), it was rebuilding the array.  
>>> One of the drives had a media error, and this caused the whole array 
>>> to be lost.
>>> This is exactly what this article warns about:
>>> http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
>> The article doesn't seem to mention the fact that if a disk in a 
>> mirror set dies and the remaining disk within the set starts to have 
>> data corruption problems, the mirror will be rebuilt from corrupted 
>> data.
> While this is true, it's far more likely that there will be a media 
> error (i.e. bad sector), and that the system will notice it.  With 
> raid 5, it will just kick out the drive, and you can say bye bye to 
> your data. 

Perhaps not totally lost, but not fun either. Force mdadm to run ; try 
to fix/relocate bad sector & force mdadm to run ; dd  to identical disk 
& force mdadm to run.

> With raid 10, if it happens on one of the disks in the other set, you 
> don't have a problem, and if it happens to the disk in the same set 
> (not very likely),

A 1 in 3 chance of putting the two worst disks together when using 4 
disk raid10.

> I'm not sure what the outcome will be, but hopefully it can recover?  
> I have just had a windows drive have a whole bunch of bad sectors and 
> I was still able to boot to windows, and copy most of the data off.  I 
> can't imagine Linux being any worse.
>> I don't know what you can do at this point, though. Perhaps make 2 
>> separate mirrors and rsync them?  You could keep copies of  changes 
>> that way.
> I know there is a raid10 personality for md.  I saw it in the source 
> code.  I see people's boot logs all over the web that say this:
> md: linear personality registered as nr 1
> md: raid0 personality registered as nr 2
> md: raid1 personality registered as nr 3
> md: raid10 personality registered as nr 9
> md: raid5 personality registered as nr 4
> Why does CentOS5 not support the raid10 personality?  Do i need to 
> custom compile md?  Do I need to custom compile the kernel?  
> Russ

You have enlightened me to the raid10 module:

[root at tikal ~]#  locate raid10

[root at tikal ~]#  modprobe raid10

[root at tikal ~]#  lsmod | grep raid
raid10                 23233  0
raid1                  20033  1

This is not a Centos 5 machine though, it's SL4.4.

Toby Bluhm
Midwest Instruments Inc.
30825 Aurora Road Suite 100
Solon Ohio 44139