[CentOS] puzzling md error ?

Mon Mar 1 01:04:17 UTC 2010

On Feb 28, 2010, at 7:15 PM, Stephen Harris <lists at spuddy.org> wrote:

> On Sun, Feb 28, 2010 at 02:37:13PM -0800, John R Pierce wrote:
>> And how do I know all these mirror data mismatches are Swap?  does  
>> not
>> each mismatch mean the mirrors disagree, which means one of them is
>> wrong.  Which one?  since they aren't timestamped or checksummed  
>> (like
>
> This thread is very timely.  I updated my C5.3 to 5.4 last week (not
> sure why it took me so long) and this morning noticed my raid5 was
> resyncing.  5*1Tbyte disks.  The resync took...
>  Feb 28 04:22:02 mercury kernel: md: syncing RAID array md3
>  Feb 28 16:27:06 mercury kernel: md: md3: sync done.
>
> Performance was bad during this time.  Not terrible from an  
> interactive
> point of view, but a job that normally run from 4am to 10am didn't  
> finish
> until 3pm.
>
> I like the concept of checking the disks are good, but it really
> sounds like there are practical problems (false positives, performance
> degregation) .
>
> So I think /etc/sysconfig/raid-check is going to read
>  ENABLED=no

It would be nice if the mismatch_cnt could be compared to a count of  
aborted writes and only resync if they differ, but mismatch_cnt  
persists and aborted writes is only maintained since last reboot.

Ideally the md raid code needs to make the writes completely atomic,  
so they either complete on all members or none and not allow an abort  
task to preempt a write in progress.

-Ross