[CentOS] Strange issue with device-mapper lib in CentOS

Tue Dec 5 21:01:41 UTC 2006
Nick Webb <webbn at acm.org>

Hi Alexey,

On 12/4/06, Alexey Loukianov <aloukianov at lavtech.ru> wrote:
>
> Hello all,
>
> a couple of weeks ago I've been installing CentOS 4.2 on a very-old
> server machine with MSI server board based on Intel GX440 chipset, two
> Xeons 500Mhz and one 1Gig of RAM. There is an AMI MegaRaid 467
> installed as a storage controller, which causes some troubles with
> installation, as stock CenOS4 install and production kernels doesn't
> have older megaraid.ko module compiled, but there are a lot of not so
> very difficult ways to overcome it. After a clean-and-relatively-fast
> install first of all I had up2date-d it to CentOS 4.4, did some basic
> initial reconfigurations, turned it off and left it lay around doing
> nothing.
>
> Today its time has come, and I turned it on. While booting I've seen a
> message "Segmentation fault" just after a line about "Starting up
> LVM2". That's confused me a bit. Logged it as root, typed vgdisplay,
> got a normal output and a message "Segmentation fault" after it.
> lvdisplay performed just the same - normal display of all LVM logical
> volumes and a "Segmentation fault" at the bottom.
>
> Next step was obvious:
> # rpm -Va
>
> Huh, here we are. There's a bunch of RPMs with binary files in them
> changed since they've been installed. Just looks like it's a virus
> job I thinked. But, wait! That's very strange! This server has been
> laying around turned off and doing nothing since the moment I've done
> the installation of the system. There were NO possible time for a
> virus to infect a system. Well, in any case, I took on my special
> LiveCD with a ClamAV on it and a RamDisk for freshclam to
> store updated virus databases, booted it, mounted possibly infected
> system and checked it with clamscan. There were NO viruses found.
>
> Well, I though that this might be caused by a faulty SCSI disk in
> array, that distort the data that's being written to it, instead of
> informing host that there's a bad block here. Ok, that's easy to
> check. Let me go to the single mode, reinstall distorted RPMs using
> rpm -Uvh --replacepkgs, do a couple of 'sync's, remount all
> filesystems with -O ro,sync, and check installed rpm's with a rpm -Va.
> Headed on, done all above, got nothing. After a reinstall all files
> became correct, and LVM tools got back to a correct behavior without
> "Segmentation fault". Hmm... that's strange, I thought. Well, at least
> ATM I've got a correctly functioning system without viruses.
> Huh, well, now it's time to reboot and check how does it performs. I'm
> going to do unattended reboots in future, it should reboot seamlessly
> without excess questions.
> # shutdown -r now
> Reboot went smoothly, but just as LVM2 was initializing, I've got
> "Segmentation fault" message again! Damn! What's wrong?! Logged in,
> rpm -Va - gotcha! Again, device-mapper RPM was broken.
> Well, let's reinstall it again, sync, remount root readonly, check with
> rpm -V device-mapper. Done that - all seems to be ok, no output from
> the rpm -V = files are intact. Rebooted again. Run:
> [root at omega MegaMgr5.20]# rpm -V device-mapper
> ..5.....    /lib/libdevmapper.so.1.02
>
> That's it. After each and every reboot I've got this file corrupted.
> Looks like it's not a faulty HDD trouble, and it's not a faulty RAID
> controller. Most likely something corrupts this file during shutdown
> process or during boot process. Haven't got enough time today to
> investigate more deeply, going to continue with it tomorrow. Will post
> here the results, if any.
>
> --
> Best regards,
> Alexey Loukianov                          mailto:aloukianov at lavtech.ru
> System Engineer,
> IT Department,
> Lavtech Corp
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos


>[root at omega MegaMgr5.20]# rpm -V device-mapper
>..5.....    /lib/libdevmapper.so.1.02

Is that the only file consistently corrupted or are there others?  I've seen
similar "mysteries" before that turned out to be a memory issue, once system
memory, once CPU cache (that was really weird).  I'm not sure this is a
memory problem, but wouldn't hurt to run a memory test.

Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20061205/00a94ac3/attachment-0005.html>