[CentOS] Strange issue with device-mapper lib in CentOS

Mon Dec 4 22:07:40 UTC 2006
Alexey Loukianov <aloukianov at lavtech.ru>

Hello all,

a couple of weeks ago I've been installing CentOS 4.2 on a very-old
server machine with MSI server board based on Intel GX440 chipset, two
Xeons 500Mhz and one 1Gig of RAM. There is an AMI MegaRaid 467
installed as a storage controller, which causes some troubles with
installation, as stock CenOS4 install and production kernels doesn't
have older megaraid.ko module compiled, but there are a lot of not so
very difficult ways to overcome it. After a clean-and-relatively-fast
install first of all I had up2date-d it to CentOS 4.4, did some basic
initial reconfigurations, turned it off and left it lay around doing
nothing.

Today its time has come, and I turned it on. While booting I've seen a
message "Segmentation fault" just after a line about "Starting up
LVM2". That's confused me a bit. Logged it as root, typed vgdisplay,
got a normal output and a message "Segmentation fault" after it.
lvdisplay performed just the same - normal display of all LVM logical
volumes and a "Segmentation fault" at the bottom.

Next step was obvious:
# rpm -Va

Huh, here we are. There's a bunch of RPMs with binary files in them
changed since they've been installed. Just looks like it's a virus
job I thinked. But, wait! That's very strange! This server has been
laying around turned off and doing nothing since the moment I've done
the installation of the system. There were NO possible time for a
virus to infect a system. Well, in any case, I took on my special
LiveCD with a ClamAV on it and a RamDisk for freshclam to
store updated virus databases, booted it, mounted possibly infected
system and checked it with clamscan. There were NO viruses found.

Well, I though that this might be caused by a faulty SCSI disk in
array, that distort the data that's being written to it, instead of
informing host that there's a bad block here. Ok, that's easy to
check. Let me go to the single mode, reinstall distorted RPMs using
rpm -Uvh --replacepkgs, do a couple of 'sync's, remount all
filesystems with -O ro,sync, and check installed rpm's with a rpm -Va.
Headed on, done all above, got nothing. After a reinstall all files
became correct, and LVM tools got back to a correct behavior without
"Segmentation fault". Hmm... that's strange, I thought. Well, at least
ATM I've got a correctly functioning system without viruses.
Huh, well, now it's time to reboot and check how does it performs. I'm
going to do unattended reboots in future, it should reboot seamlessly
without excess questions.
# shutdown -r now
Reboot went smoothly, but just as LVM2 was initializing, I've got
"Segmentation fault" message again! Damn! What's wrong?! Logged in,
rpm -Va - gotcha! Again, device-mapper RPM was broken.
Well, let's reinstall it again, sync, remount root readonly, check with
rpm -V device-mapper. Done that - all seems to be ok, no output from
the rpm -V = files are intact. Rebooted again. Run:
[root at omega MegaMgr5.20]# rpm -V device-mapper
..5.....    /lib/libdevmapper.so.1.02

That's it. After each and every reboot I've got this file corrupted.
Looks like it's not a faulty HDD trouble, and it's not a faulty RAID
controller. Most likely something corrupts this file during shutdown
process or during boot process. Haven't got enough time today to
investigate more deeply, going to continue with it tomorrow. Will post
here the results, if any.

-- 
Best regards,
 Alexey Loukianov                          mailto:aloukianov at lavtech.ru
 System Engineer,
 IT Department,
 Lavtech Corp