On 02/28/2012 04:27 PM, Kahlil Hodgson wrote: > Hello, > > Having a problem with software RAID that is driving me crazy. > > Here's the details: > > 1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot). > 2. Reasonably good PC hardware (i.e. not budget, but not server grade either) > with a pair of 1TB Western Digital SATA3 Drives. > 3. Drives are plugged into the SATA3 ports on the mainboard (both drives and > cables say they can do 6Gb/s). > 4. During the install I set up software RAID1 for the two drives with two raid > partitions: > md0 - 500M for /boot > md1 - "the rest" for a physical volume > 5. Setup LVM on md1 in the standard slash, swap, home layout > > Install goes fine (actually really fast) and I reboot into CentoS 6.2. Next I > ran yum update, added a few minor packages and performed some basic > configuration. > > Now I start to get I/O errors on printed on the console. Run 'mdadm -D > /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as > faulty. > > Okay, fair enough, I've got at least one bad drive. I boot the system from a > live usb and run the short and long SMART tests on both drive. No problems > reported but I know that can be misleading, so I'm going to have to gather some > evidence before I try to return these drives. I run badblocks in destructive > mode on both drives as follows > > badblocks -w -b 4096 -c 98304 -s /dev/sda > badblocks -w -b 4096 -c 98304 -s /dev/sdb > > Come back the next day and see that no errors are reported. Er thats odd. I > check the SMART data in case badblocks activity has triggered something. > Nope. Maybe I screwed up the install somehow? > > So I start again and repeat the install process very carefully. This time I > check the raid array straight after boot. > > mdadm -D /dev/md0 - all is fine. > mdadm -D /dev/md1 - the two drives are resyncing. > > Okay, that is odd. The RAID1 array was created at the start of the install > process, before any software was installed. Surely it should be in sync > already? Googled a bit and found a post were someone else had seen same thing > happen. The advice was to just wait until the drives sync so the 'blocks > match exactly' but I'm not really happy with the explanation. At this rate > its going to take a whole day to do a single minimal install and I'm sure I > would have heard others complaining about the process. > > Anyway, I leave the system to sync for the rest of the day. When I get back to > it I see the same (similar) I/O errors on the console and mdadm shows the RAID > array is degraded, /dev/sdb2 has been marked as faulty. This time I notice > that the I/O errors all refer to /dev/sda. Have to reboot because the fs is > now readonly. When the system comes back up, its trying to resync the drive > again. Eh? > > Any ideas what is going on here? If its bad drives, I really need some > confirmation independent of the software raid failing. I thought SMART or > badblocks give me that. Perhaps it has nothing to do with the drives. Could a > problem with the mainboard or the memory cause this issue? Is it a SATA3 > issue? Should I try it on the 3Gb/s channels since there's probably little > speed difference with non-SSDs? > > Cheers, > > Kal > > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > > I just had a very similar problem with a raid 10 array with four new 1TB drives. It turned out to be the SATA cable. I first tried a new drive and even replaced the five disk hot plug carrier. It was always the same logical drive (/dev/sdb). I then tried using an additional SATA adapter card. That cinched it, as the only thing common to all the above was the SATA cable. All has been well for a week now. I should have tired replacing the cable first :-) Emmett