[CentOS] Re: Raid5 issues

Fri May 4 18:31:55 UTC 2007
Toby Bluhm <tkb at midwestinstruments.com>

Ruslan Sivak wrote:
> Toby Bluhm wrote:
>> Ruslan Sivak wrote:
>>> Toby Bluhm wrote:
>>>> Ruslan Sivak wrote:
>>>>> Feizhou wrote:
>>>>>> Ruslan Sivak wrote:
>>>>>>> Feizhou wrote:
>>>>>>>>
>>>>>>>>> I do have a SIL3114 chipset, and I think it's supposed to be 
>>>>>>>>> supported by device mapper.  When I go to rescue mode, I see 
>>>>>>>>> it loading the driver for SIL3112, but nothing appears under 
>>>>>>>>> /dev/mapper except control.  Are there instructions somewhere 
>>>>>>>>> on getting it to use my controller's raid?
>>>>>>>>
>>>>>>>> Your controller only has a bios chip. It has no raid processing 
>>>>>>>> capability at all.
>>>>>>>>
>>>>>>>> You need to use mdadm. anaconda should be able to let you 
>>>>>>>> create to mirrors and then create a third array that stripes 
>>>>>>>> those md devices,
>>>>>>>> _______________________________________________
>>>>>>> Anaconda doesn't let me create a stripe raid set on top of a 
>>>>>>> mirror set.  And it doesn't detect it when I do it manually.
>>>>>>> Also the bios chip presents additional issues.  I believe when I 
>>>>>>> don't have a raid array set up, it won't boot at all.  When I 
>>>>>>> have it on raid10, I had trouble booting, and when I have it on 
>>>>>>> concatenation, everything works fine, until a drive is 
>>>>>>> replaced.  At that point, i have to recreate the array, as 
>>>>>>> concatenation is not a fault tolerant set, and at this point I 
>>>>>>> seem to lose all my data.
>>>>>>
>>>>>> It won't boot at all without a raid array setup? That sounds 
>>>>>> really funny.
>>>>>>
>>>>> Actually I'm not 100% sure on this, but I think this is the case.  
>>>>> I believe the first time I set it up as a raid10, assuming that 
>>>>> linux will just ignore it.  I installed centos by putting boot on 
>>>>> a raid1, and root on LVM over 2 raid1 sets.  I had trouble getting 
>>>>> it to boot.
>>>>>>> Is there a way to get it to use the raid that's part of the bios 
>>>>>>> chip?  
>>>>>>
>>>>>> Repeat after me. There is no raid that is part of the bios chip. 
>>>>>> It is just a simple table.
>>>>> Yes, I know this is fakeraid, aka softraid, but I was hoping that 
>>>>> using the drivers would make it easier to support raid 10 then 
>>>>> with mdadm, which seems to be impossible to get to work with the 
>>>>> installer.  I'm not even sure why the raid10 personality is not 
>>>>> loaded, as it seems to have been part of the mdadm since version 1.7.
>>>>>>> Something about device mapper?
>>>>>>
>>>>>>
>>>>>> You need the fake raid driver dmraid if you are going to set up 
>>>>>> stuff in the bios. What version of centos are you trying to 
>>>>>> install? libata in Centos 5 should support this without having to 
>>>>>> resort to the ide drivers.
>>>>>> _________________________________
>>>>> I'm trying to install centos 5 - the latest.  How would I go about 
>>>>> using dmraid and/or libata?  The installer picks up the drives as 
>>>>> individual drives.  There is a drive on the silicon image website, 
>>>>> but it's for RHEL4, and I couldn't get it to work.  I'm open to 
>>>>> using md for raid, or even LVM, if it supports it.  I just want to 
>>>>> be able to use raid10, as I can't trust raid5 anymore.
>>>>>
>>>>
>>>> IIRC you had two out of four new disks die? So maybe it would be 
>>>> more accurate to say it's your hardware you don't trust. Raid5 is 
>>>> used without problems by ( I assume ) many, many people, myself 
>>>> included. You could have a raid10 and still lose the whole array if 
>>>> two disks that in the same mirror die at once. I guess no software 
>>>> in the world can really overcome bad hardware. That's why we do 
>>>> backups :)
>>>>
>>>> Anyway, perhaps excersizing /stressing the disks for a few days 
>>>> without error would make you feel more confident about the HDs.
>>>>
>>>
>>> Actually, 2 disks did not die.  Due to the fact that it was a new 
>>> raid 5 array (or for whatever reason), it was rebuilding the array.  
>>> One of the drives had a media error, and this caused the whole array 
>>> to be lost.
>>> This is exactly what this article warns about:
>>>
>>> http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
>>
>>
>> The article doesn't seem to mention the fact that if a disk in a 
>> mirror set dies and the remaining disk within the set starts to have 
>> data corruption problems, the mirror will be rebuilt from corrupted 
>> data.
> While this is true, it's far more likely that there will be a media 
> error (i.e. bad sector), and that the system will notice it.  With 
> raid 5, it will just kick out the drive, and you can say bye bye to 
> your data. 

Perhaps not totally lost, but not fun either. Force mdadm to run ; try 
to fix/relocate bad sector & force mdadm to run ; dd  to identical disk 
& force mdadm to run.

> With raid 10, if it happens on one of the disks in the other set, you 
> don't have a problem, and if it happens to the disk in the same set 
> (not very likely),

A 1 in 3 chance of putting the two worst disks together when using 4 
disk raid10.


> I'm not sure what the outcome will be, but hopefully it can recover?  
> I have just had a windows drive have a whole bunch of bad sectors and 
> I was still able to boot to windows, and copy most of the data off.  I 
> can't imagine Linux being any worse.
>> I don't know what you can do at this point, though. Perhaps make 2 
>> separate mirrors and rsync them?  You could keep copies of  changes 
>> that way.
>>
> I know there is a raid10 personality for md.  I saw it in the source 
> code.  I see people's boot logs all over the web that say this:
>
> md: linear personality registered as nr 1
> md: raid0 personality registered as nr 2
> md: raid1 personality registered as nr 3
> md: raid10 personality registered as nr 9
> md: raid5 personality registered as nr 4
>
> Why does CentOS5 not support the raid10 personality?  Do i need to 
> custom compile md?  Do I need to custom compile the kernel?  
> Russ
>
>

You have enlightened me to the raid10 module:

[root at tikal ~]#  locate raid10
/usr/src/kernels/2.6.9-42.0.3.EL-smp-i686/include/config/md/raid10
/usr/src/kernels/2.6.9-42.0.3.EL-smp-i686/include/config/md/raid10/module.h
/usr/src/kernels/2.6.9-42.0.3.EL-smp-i686/include/linux/raid/raid10.h
/usr/src/kernels/2.6.9-42.0.10.EL-i686/include/config/md/raid10
/usr/src/kernels/2.6.9-42.0.10.EL-i686/include/config/md/raid10/module.h
/usr/src/kernels/2.6.9-42.0.10.EL-i686/include/linux/raid/raid10.h
/usr/src/kernels/2.6.9-42.0.3.EL-i686/include/config/md/raid10
/usr/src/kernels/2.6.9-42.0.3.EL-i686/include/config/md/raid10/module.h
/usr/src/kernels/2.6.9-42.0.3.EL-i686/include/linux/raid/raid10.h
/lib/modules/2.6.9-42.0.3.EL/kernel/drivers/md/raid10.ko
/lib/modules/2.6.9-42.0.10.EL/kernel/drivers/md/raid10.ko
/lib/modules/2.6.9-42.0.3.ELsmp/kernel/drivers/md/raid10.ko

[root at tikal ~]#  modprobe raid10

[root at tikal ~]#  lsmod | grep raid
raid10                 23233  0
raid1                  20033  1

This is not a Centos 5 machine though, it's SL4.4.

-- 
Toby Bluhm
Midwest Instruments Inc.
30825 Aurora Road Suite 100
Solon Ohio 44139
440-424-2250