[CentOS] /etc/cron.weekly/99-raid-check

Tue Dec 1 13:05:56 UTC 2009
Paul Bijnens <Paul.Bijnens at xplanation.com>

On 2009-12-01 13:51, Farkas Levente wrote:
> On Tue, Dec 1, 2009 at 11:06, RedShift <redshift at pandora.be> wrote:
>> Jancio Wodnik wrote:
>>> W dniu 30.11.2009 14:08, Farkas Levente pisze:
>>>> hi,
>>>> it's been a few weeks since rhel/centos 5.4 released and there were many
>>>> discussion about this new "feature" the weekly raid partition check.
>>>> we've got a lot's of server with raid1 system and i already try to
>>>> configure them not to send these messages, but i'm not able ie. i
>>>> already add to the SKIP_DEVS all of my swap partitions (since i read it
>>>> on linux-kernel list that there can be mismatch_cnt even though i still
>>>> not understand why?). but even the data partitions (ie. all of my
>>>> servers all raid1 partitions) produce this error (ie. ther mismatch_cnt
>>>> is never 0 at the weekend). and this cause all of my raid1 partitions
>>>> are rebuild during the weekend. and i don't like it:-(
>>>> so my questions:
>>>> - is it a real bug in the raid1 system?
>>>> - is it a real bug in my disk which runs raid (not really believe since
>>>> it's dozens of servers)?
>>>> - the /etc/cron.weekly/99-raid-check is wrong in rhel/centos-5.4?
>>>> or what's the problem?
>>>> can someone enlighten me?
>>>> thanks in advance.
>>>> regards.
>>>>
>>>>
>>> Hi. I have this problem on my 2 servers (Both Centos 5.4) - every
>>> weekend my raid1 set is rebuild, because
>>>
>>> mismatch_cnt is never 0 at the weekend. What is really going on ? My 1TB disk whith raid1 are rebuild every weekend.
>>>
>> They aren't being rebuilt, they are being checked if the data on the RAID disks are consistent. There are various reasons why mismatch_cnt can be higher than 0, for example aborted writes. Generally it's not really something to be worried about if you have for example a swap partition in your RAID array. If you do a repair and then a check the mismatch_cnt should reset to 0.
> 
> the mismatch_cnt is not 0 so then automatically checked and repaired
> on all weekend. and this is not a swap partition as i wrote there are
> the /srv partitions with only data.
> 
> 

I have the problem on 2 servers, and both of those servers are also running
a VMware image (very small, but constantly used) under VMware Server 2.
Could it be that the .vmem file, or even the virtual disk is constantly
written to, and the raid is constantly out of sync because of that?
(All my other VMware servers have hardware raid cards; or are still on
Centos4.)

The mismatch count is never large: 128 is usual; 512 is the maximum I've seen.

Actually, one of the servers passed the test this weekend, but has
again a mismatch count at this moment.

Two weeks ago, I had time to shut down one server completely, and had
the miscount match fixed twice, soon after startup, the mismatch count
was again != 0.


-- 
Paul Bijnens, Xplanation Technology Services        Tel  +32 16 397.525
Interleuvenlaan 86, B-3001 Leuven, BELGIUM          Fax  +32 16 397.552
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, ~., *
* stop, end, ^]c, +++ ATH, disconnect,  halt,  abort,  hangup,  KJOB, *
* ^X^X,  :D::D,  kill -9 1,  kill -1 $$,  shutdown,  init 0,  Alt-F4, *
* Alt-f-e, Ctrl-Alt-Del, Alt-SysRq-reisub, Stop-A, AltGr-NumLock, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************