[CentOS] Re: Raid5 issues

Thu May 3 15:53:37 UTC 2007

I ran the manufacturers tests, and it failed again on the short test with the media error and the long test fixed it.  I will be exchanging the drive for a new one, but for now its up and running.  

This incident, however, has reinforced what I've read in many places - raid 5 is not safe.  If during a rebuild a media error is encountered, you lose all your data.  

I would like to set up raid 10 instead, but it doesn't seem like its supported by my mdadm - the proper personalities are not loaded.  How do I get the raid10 personality in there?  

Assuming I get the raid10 personality in there, how do I concert to raid10?  I suppose I can fail one of the drives in the raid5, set up a filesystem on it, copy the data from the raid5, kill the raid5, set up a raid10 from the three disks in the raid5 set with one missing, copy the data from the single drive, and add the single drive to the raid10 array.  Is this the right path?  Would I just cp certain directories over or use something like dd?  

Any help would be greatly appreciated.  

Russ
Sent wirelessly via BlackBerry from T-Mobile.  

-----Original Message-----
From: Scott Silva <ssilva at sgvwater.com>
Date: Tue, 01 May 2007 13:28:50 
To:centos at centos.org
Subject: [CentOS] Re: Raid5 issues

Ruslan Sivak spake the following on 5/1/2007 1:10 PM:
> Luciano Miguel Ferreira Rocha wrote:
>> On Tue, May 01, 2007 at 02:24:34PM -0400, Ruslan Sivak wrote:
>>  
>>> I tried recreating the array, but it won't mount...
>>>
>>> Also a bit weird is that I did a SMART short self-test, and one of
>>> the drives keeps returning a Read-Failure, but the overall SMART
>>> status is passed.  Could this be from overheating?
>>>     
>>
>> I'm not sure. If those Read-Failure are error counts, than yes, I think
>> so. The drive is currently OK, but when it had a lot of activity it
>> overheated and started getting read errors.
>>   
> Actually what I'm saying is that I ran smartctl and did a short test on
> the drive, and it keeps returning a read error at a certain porint (LBA
> 271739730).  I let it cool off for a few hours, and tried the test
> again, same thing.
> Does this mean that the drive has developed bad sectors and needs
> replacement?  These are brand new drives.  The error only happens on 1
> of the 4.
> 
Yes, that can be a sign of impending doom for the drive. The drives age
doesn't have anything to do with it. A drive can fail in anything between
minutes and years. I have had new drives fail in days, and I have some old
drives that still keep chugging along in routers and such that have really
only lost their usefulness because they are so small.
I have an old print server running on an old Dell 486 with a 80 MB drive (
that's MB not GB). It just refuses to die on its own, and it has been in
continuous operation for over 12 years. It will get replaced when it dies, but
it still works great, and it is fast enough to spool print files.
If you run the manufacturers tests on the drive, and it shows any error,
return it.

-- 

MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!

_______________________________________________
CentOS mailing list
CentOS at centos.org
http://lists.centos.org/mailman/listinfo/centos