[CentOS] filesystem corruption?

Mon Apr 6 21:37:04 UTC 2015
m.roth at 5-cent.us <m.roth at 5-cent.us>

Got an older server here, running CentOS 6.6 (64-bit). Suddenly, at
0-dark-30 yesterday morning, we had failures to connect.

After several tries to reboot and get working, I tried yum update, and
that failed, complaining of an python krb5 error. With more investigation,
I discovered that logins were failing as there was a problem with pam;
this turned out to be it couldn't open /lib64/security/pam_permit.so. The
reason for that was that it was a broken symlink, pointing to a file in
the same directory, that actually existed in the /lib64. Checking other
systems, I found it should, in fact, be a file, not a symlink.

At this point, the system was considered suspect. I brought the system
down, replaced the root drive, and rebuilt. I was not able to build it as
CentOS 7, as something in the older hardware broke the install. CentOS 6
built successfully, and the server was returned to service.

I then loaded the drive in another server, and examined it. fsck reported
both / and /boot were clean, but when I redid this with fask -c, to check
for bad blocks, it found many multiply-claimed blocks.

First question: anyone have an idea why it showed as clean, until I
checked for bad blocks? Would that just be because I'd gracefully shut
down the original server, and it mounted ok on the other server?

Mounting it on /mnt, I found no driver errors being reported in the logs,
nor anything happening, including logons, before an automated contact from
another server, which failed. AND I checked our loghost, and nothing odd
shows there, neither in message nor in secure.

At this point, I *think* it's filesystem corruption, rather than a
compromised system, but I'd really like to hear anyone's thoughts on this.

      mark