Saturday I did an upgrade from 5.3 (original install) to 5.4. Saturday night, /etc/cron.weekly reported the following:
/etc/cron.weekly/99-raid-check:
WARNING: mismatch_cnt is not 0 on /dev/md0
md0 holds /boot and resides, mirrored, on sda1 and sdb1. md1 holds an LVM volume containing the remaining filesytems, including swap.
The underlying hardware is just a few months hold, has passed the usual memtest stuff, and has been running 5.3 well for a few months.
I'm *guessing* that due to the timing, this is related to the upgrade. I have to admit that I forgot myself and instead of doing the glibc updates as recommended, I only did:
yum clean all yum update yum rpm -e --nodeps perl-5.8.8-18.el5_3.1.i386 (see today's perl thread) yum update perl.x86_64 yum update shutdown -r now
I've taken a backup of /boot dump after the upgrade, but have not yet reenabled normal backups.
My hunch is that something in the upgrade process touched sda1 but not sdb1, and that removing sdb1 from the mirror and reattaching it for resync would be sufficient, however I was looking for comments on this from anyone with experience or opinion on the matter. Googling the issue doesn't seem to turn up any recent related results.
Also, could the upgrade have touched the bootblock on sda1 but not sdb1 and thus trigger this problem?
Devin
Devin Reade wrote:
Saturday I did an upgrade from 5.3 (original install) to 5.4. Saturday night, /etc/cron.weekly reported the following:
/etc/cron.weekly/99-raid-check: WARNING: mismatch_cnt is not 0 on /dev/md0
md0 holds /boot and resides, mirrored, on sda1 and sdb1. md1 holds an LVM volume containing the remaining filesytems, including swap.
The underlying hardware is just a few months hold, has passed the usual memtest stuff, and has been running 5.3 well for a few months.
I'm *guessing* that due to the timing, this is related to the upgrade. I have to admit that I forgot myself and instead of doing the glibc updates as recommended, I only did:
yum clean all yum update yum rpm -e --nodeps perl-5.8.8-18.el5_3.1.i386 (see today's perl thread) yum update perl.x86_64 yum update shutdown -r now
I've taken a backup of /boot dump after the upgrade, but have not yet reenabled normal backups.
My hunch is that something in the upgrade process touched sda1 but not sdb1, and that removing sdb1 from the mirror and reattaching it for resync would be sufficient, however I was looking for comments on this from anyone with experience or opinion on the matter. Googling the issue doesn't seem to turn up any recent related results.
Also, could the upgrade have touched the bootblock on sda1 but not sdb1 and thus trigger this problem?
Devin
What exactly is the mismatch_cnt value? If it's not too much, it is most likely coming from your swap partition.
Run a check, if that doesn't fail I wouldn't worry about it.
Glenn
RedShift redshift@pandora.be wrote:
What exactly is the mismatch_cnt value? If it's not too much, it is most likely coming from your swap partition.
128. md0 is /boot only; swap is on md1 which didn't have a problem
Devin
On Sun, 2009-10-25 at 12:33 -0600, Devin Reade wrote:
Saturday I did an upgrade from 5.3 (original install) to 5.4. Saturday night, /etc/cron.weekly reported the following:
/etc/cron.weekly/99-raid-check: WARNING: mismatch_cnt is not 0 on /dev/md0
I had this happen on a box that I upgraded Friday. I went ahead and tested each partition in the affected mirror with badblocks ( found no errors ) and after multiple resyncs, there was no change. After similar experiences with Google, I did run across a note saying that this went away after a reboot. I broke down and applied the Micro$lop solution ( reboot ) and the error has gone away.
Like you, I'm interested in a better understanding of this issue, so if anyone else has more info, I'm all ears. ;>
md0 holds /boot and resides, mirrored, on sda1 and sdb1. md1 holds an LVM volume containing the remaining filesytems, including swap.
The underlying hardware is just a few months hold, has passed the usual memtest stuff, and has been running 5.3 well for a few months.
I'm *guessing* that due to the timing, this is related to the upgrade. I have to admit that I forgot myself and instead of doing the glibc updates as recommended, I only did:
yum clean all yum update yum rpm -e --nodeps perl-5.8.8-18.el5_3.1.i386 (see today's perl thread) yum update perl.x86_64 yum update shutdown -r now
I've taken a backup of /boot dump after the upgrade, but have not yet reenabled normal backups.
My hunch is that something in the upgrade process touched sda1 but not sdb1, and that removing sdb1 from the mirror and reattaching it for resync would be sufficient, however I was looking for comments on this from anyone with experience or opinion on the matter. Googling the issue doesn't seem to turn up any recent related results.
Also, could the upgrade have touched the bootblock on sda1 but not sdb1 and thus trigger this problem?
Devin
On Sun, 2009-10-25 at 14:52 -0400, Ron Loftin wrote:
On Sun, 2009-10-25 at 12:33 -0600, Devin Reade wrote:
Saturday I did an upgrade from 5.3 (original install) to 5.4. Saturday night, /etc/cron.weekly reported the following:
/etc/cron.weekly/99-raid-check: WARNING: mismatch_cnt is not 0 on /dev/md0
I had this happen on a box that I upgraded Friday. I went ahead and tested each partition in the affected mirror with badblocks ( found no errors ) and after multiple resyncs, there was no change. After similar experiences with Google, I did run across a note saying that this went away after a reboot. I broke down and applied the Micro$lop solution ( reboot ) and the error has gone away.
Like you, I'm interested in a better understanding of this issue, so if anyone else has more info, I'm all ears. ;>
mismatch_cnt (/sys/block/md*/md/mismatch_cnt) is the number of unsynchronized blocks in the raid.
The repair is to rebuild the raid:
# echo repair >/sys/block/md<#>/md/sync_action
...which does not reset the count, but if you force a check after the rebuild is complete:
# echo check >/sys/block/md<#>/md/sync_action
...then the count should return to zero.
Or at least that worked for me on three systems today.
Steve
S.Tindall tindall.satwth@brandxmail.com wrote:
mismatch_cnt (/sys/block/md*/md/mismatch_cnt) is the number of unsynchronized blocks in the raid.
Understood.
I did the repair/check on sync_action and it got rid of the problem. (Thanks)
What I _don't_ understand is why they were unsynchronized to begin with (`cat /proc/mdstat` showed the array to be clean). Nor do I understand the mechanism used by the 'repair' mechanism, and why I should believe that it's using the correct data in its sync. Although I've looked around, I've not seen anything that describes how repair works and (specifically for raid1) how it can tell which slice has the good data and which has the bad data.
"Fixing" things without understanding what is going on under the covers (at least conceptually) does not give me a warm fuzzy feeling :/
Devin
On 10/25/2009 07:33 PM, Devin Reade wrote: ...
WARNING: mismatch_cnt is not 0 on /dev/md0
I have two machines with software RAID 1 running CentOS, they both gave this message this weekend.
Mogens
The /etc/cron.weekly/99-raid-check script is new for 5.4. Read through the mdadm list. You will find that small mismatch counts on RAID 1 is normal. I don't remember the exact reason but it has to do with aborted writes where the queue has already committed the one drive and not the other. Since it is in an unused area of the file system and mdadm can't tell when the aborted write happened it is just left alone. This is why it is common on swap partitions.
Ryan
On Mon, Oct 26, 2009 at 2:48 AM, Mogens Kjaer mk@crc.dk wrote:
On 10/25/2009 07:33 PM, Devin Reade wrote: ...
WARNING: mismatch_cnt is not 0 on /dev/md0
I have two machines with software RAID 1 running CentOS, they both gave this message this weekend.
Mogens
Mogens Kjaer, Carlsberg A/S, Computer Department Gamle Carlsberg Vej 10, DK-2500 Valby, Denmark Phone: +45 33 27 53 25, Mobile: +45 22 12 53 25 Email: mk@crc.dk Homepage: http://www.crc.dk _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos