[CentOS] Software RAID1 with CentOS-6.2

Wed Feb 29 00:48:17 UTC 2012

on 2/28/2012 4:27 PM Kahlil Hodgson spake the following:
> Hello,
>
> Having a problem with software RAID that is driving me crazy.
>
> Here's the details:
>
> 1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).
> 2. Reasonably good PC hardware (i.e. not budget, but not server grade either)
> with a pair of 1TB Western Digital SATA3 Drives.
> 3. Drives are plugged into the SATA3 ports on the mainboard (both drives and
> cables say they can do 6Gb/s).
> 4. During the install I set up software RAID1 for the two drives with two raid
> partitions:
>      md0 - 500M for /boot
>      md1 - "the rest" for a physical volume
> 5. Setup LVM on md1 in the standard slash, swap, home layout
>
> Install goes fine (actually really fast) and I reboot into CentoS 6.2.  Next I
> ran yum update, added a few minor packages and performed some basic
> configuration.
>
> Now I start to get I/O errors on printed on the console.  Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as
> faulty.
>
> Okay, fair enough, I've got at least one bad drive.  I boot the system from a
> live usb and run the short and long SMART tests on both drive.  No problems
> reported but I know that can be misleading, so I'm going to have to gather some
> evidence before I try to return these drives.  I run badblocks in destructive
> mode on both drives as follows
>
>      badblocks -w -b 4096 -c 98304 -s /dev/sda
>      badblocks -w -b 4096 -c 98304 -s /dev/sdb
>
> Come back the next day and see that no errors are reported. Er thats odd.  I
> check the SMART data in case badblocks activity has triggered something.
> Nope.  Maybe I screwed up the install somehow?
>
> So I start again and repeat the install process very carefully.  This time I
> check the raid array straight after boot.
>
>      mdadm -D /dev/md0   -   all is fine.
>      mdadm -D /dev/md1   -   the two drives are resyncing.
>
> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already?  Googled a bit and found a post were someone else had seen same thing
> happen.  The advice was to just wait until the drives sync so the 'blocks
> match exactly' but I'm not really happy with the explanation.  At this rate
> its going to take a whole day to do a single minimal install and I'm sure I
> would have heard others complaining about the process.
>
> Anyway, I leave the system to sync for the rest of the day.  When I get back to
> it I see the same (similar) I/O errors on the console and mdadm shows the RAID
> array is degraded, /dev/sdb2 has been marked as faulty.  This time I notice
> that the I/O errors all refer to /dev/sda.  Have to reboot because the fs is
> now readonly.  When the system comes back up, its trying to resync the drive
> again. Eh?
>
> Any ideas what is going on here? If its bad drives, I really need some
> confirmation independent of the software raid failing. I thought SMART or
> badblocks give me that. Perhaps it has nothing to do with the drives.  Could a
> problem with the mainboard or the memory cause this issue?  Is it a SATA3
> issue?  Should I try it on the 3Gb/s channels since there's probably little
> speed difference with non-SSDs?
>
> Cheers,
>
> Kal
First thing... Are they green drives? Green drives power down randomly and can 
cause these types of errors... Also, maybe the 6GB sata isn't fully supported 
by linux and that board... Try the 3 GB channels