[CentOS] Software RAID1 with CentOS-6.2

On 02/28/2012 04:27 PM, Kahlil Hodgson wrote:
> Hello,
> 
> Having a problem with software RAID that is driving me crazy.
> 
> Here's the details:
> 
> 1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).
> 2. Reasonably good PC hardware (i.e. not budget, but not server grade either)
> with a pair of 1TB Western Digital SATA3 Drives.
> 3. Drives are plugged into the SATA3 ports on the mainboard (both drives and
> cables say they can do 6Gb/s).
> 4. During the install I set up software RAID1 for the two drives with two raid
> partitions:
>      md0 - 500M for /boot
>      md1 - "the rest" for a physical volume
> 5. Setup LVM on md1 in the standard slash, swap, home layout
> 
> Install goes fine (actually really fast) and I reboot into CentoS 6.2.  Next I
> ran yum update, added a few minor packages and performed some basic
> configuration.
> 
> Now I start to get I/O errors on printed on the console.  Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as
> faulty.
> 
> Okay, fair enough, I've got at least one bad drive.  I boot the system from a
> live usb and run the short and long SMART tests on both drive.  No problems
> reported but I know that can be misleading, so I'm going to have to gather some
> evidence before I try to return these drives.  I run badblocks in destructive
> mode on both drives as follows
> 
>      badblocks -w -b 4096 -c 98304 -s /dev/sda
>      badblocks -w -b 4096 -c 98304 -s /dev/sdb
> 
> Come back the next day and see that no errors are reported. Er thats odd.  I
> check the SMART data in case badblocks activity has triggered something.
> Nope.  Maybe I screwed up the install somehow?
> 
> So I start again and repeat the install process very carefully.  This time I
> check the raid array straight after boot.
> 
>      mdadm -D /dev/md0   -   all is fine.
>      mdadm -D /dev/md1   -   the two drives are resyncing.
> 
> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already?  Googled a bit and found a post were someone else had seen same thing
> happen.  The advice was to just wait until the drives sync so the 'blocks
> match exactly' but I'm not really happy with the explanation.  At this rate
> its going to take a whole day to do a single minimal install and I'm sure I
> would have heard others complaining about the process.
> 
> Anyway, I leave the system to sync for the rest of the day.  When I get back to
> it I see the same (similar) I/O errors on the console and mdadm shows the RAID
> array is degraded, /dev/sdb2 has been marked as faulty.  This time I notice
> that the I/O errors all refer to /dev/sda.  Have to reboot because the fs is
> now readonly.  When the system comes back up, its trying to resync the drive
> again. Eh?
> 
> Any ideas what is going on here? If its bad drives, I really need some
> confirmation independent of the software raid failing. I thought SMART or
> badblocks give me that. Perhaps it has nothing to do with the drives.  Could a
> problem with the mainboard or the memory cause this issue?  Is it a SATA3
> issue?  Should I try it on the 3Gb/s channels since there's probably little
> speed difference with non-SSDs?
> 
> Cheers,
> 
> Kal
> 
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 
> 
I just had a very similar problem with a raid 10 array with four new 1TB drives.  It turned out to be the SATA cable.

I first tried a new drive and even replaced the five disk hot plug carrier.  It was always the same logical drive (/dev/sdb).  I then tried using an additional SATA adapter card.  That cinched it, as the only thing common to all the above was the SATA cable.

All has been well for a week now.

I should have tired replacing the cable first :-)

Emmett