[CentOS] Strange problem with filesystem changes reverting on reboot

Wed May 6 12:47:18 UTC 2009
Les Mikesell <lesmikesell at gmail.com>

Bart Schaefer wrote:
> Our sysadmin was doing midnight work on moving some hardware to new
> power outlets.  We'd recently done a CentOS 5.3 install on one of
> those machines and then "yum install" with the centosplus kernel and
> some rpmforge packages.  It had been up and running fine for at least
> two weeks in that configuration.  He sent this message:
> 
> On reboot the root file system seems to have reverted to the previous
> startup -- no CentOS plus, no Dag repository info in /etc/yum.repos.d,
> no xfs, and therefore no /var/lib/mysql.  This is at least the second
> time we've experienced this phenomenon ... suffice to say I am really
> really suspicious about ext3 now.
> 
> The previous time this occurred was quite some time ago, probably soon
> after the CentOS 5.1 release -- we'd written it off as pilot error of
> some kind.  The root is not an LVM, but it is on a software RAID -- my
> suspicion leans more toward a RAID issue than ext3.
> 
> Does any of this sound familiar to anyone?

The only way I can even imagine that happening would be on a RAID1 where 
  the mirrors were not in sync when you made the changes so they only 
happened on one drive.  There are reasonably common circumstances to 
cause this, so you should always check with 'cat /proc/mdstat' to be 
sure both mirrors are active.  Then, the more unlikely part is that when 
you rebooted, the previously active mirror was not recognized and the 
previously idle mirror became active instead.  Again, 'cat /proc/mdstat' 
would have shown the problem - and if that was it, unless the drives had 
started to sync the wrong direction, the quick fix would have been to 
simply remove the drive with the old contents, forcing the other one to 
be used.   One possibility here would be that the partition type on the 
drive that didn't join the raid at reboot was not set to 'FD' for 
autodetect.   And the one that had the old contents might have had some 
error that caused it to be kicked out of the set earlier - the system 
does seem to be very sensitive about that where if there is only one 
drive it will do many more retries.

-- 
    Les Mikesell
      lesmikesell at gmail.com