[CentOS] mdraid strange surprises...

Wed Oct 9 16:17:01 UTC 2013
Markus Falb <wnefal at gmail.com>

On 09.Okt.2013, at 16:55, John Doe wrote:

> Hey,
> 
> I installed 2 new data servers with a big (12TB) RAID6 mdraid.

...

> Since my desktop is a RAID1 mdraid on 2 disks, I decided to have a look for 
> 
> fun...  Apart from some low count mismatches, I did not have many problems...
> Did the whole check+repair+check on 3 mds and had a look at mdstat...

I think there should not be any count mismatches with raid 6, but...
md raid 1 is another beast. Such count mismatches can happen fairly easily.

a page in the virtual memory is modified, eventually it sends it to both disks. One disk is a little bit slower, and you have your potential mismatch. As I understand, the raid check does not care about virtual memory but acts on physical disk sectors. If the raid check checks a block in the very moment where one disk has written it but the other disk *not yet* then..., well, you get it?

I do not know exactly about md's raid 6, but I always thougt that this false positive mismatch count thing was raid 1 specific.
Because of this all I also would tend to turn off the weekly raid check cronjob for md raid 1

> And mdraid seems not alarmed about it...
> 1. Is there something to activate to get some kind of mdraid warnings?
>    In /var/log/messages I cannot find any alarming message.

$ chkconfig --list mdmonitor
mdmonitor      	0:off	1:off	2:on	3:on	4:on	5:on	6:off

configure it with a working email address.
And there *are* entries in /var/log/messages. Could it be that this happened a long time ago, you did not notice and the log files rotated out?

-- 
Markus