On Wed, Jul 11, 2007 at 06:20:50PM -0300, Eduardo Grosclaude alleged:
Out of the blue, dmesg on my HP Proliant w/ a SCSI disk gives loads of messages like this one:
EXT3-fs error (device dm-0) in start_transaction: Journal has aborted
Then the root fs goes read-only, so little else can be done on the machine. LVM locks up. At restart, fs needs a reboot to recover after fsck. The host starts up ok, then I am given some more minutes before the problem reappears. This is stock CentOS 4.4, never have gotten to update it because of this very same problem.
System logs say SCSI I/O error, but SMART says no problem has been found, neither does badblocks (run from a rescue CD bootup). SCSI cabling, terminator, etc has been checked.
What should I investigate next? Is the disk condemned?
Quite likely the drive is dieing. If you want proof from SMART, something like 'smartctl -t long /dev/sda' will likely fail.