[CentOS] Emergency rescue help needed

Thu Jan 29 17:02:33 UTC 2009
Anne Wilson <cannewilson at googlemail.com>

2009/1/29 Scott Silva <ssilva at sgvwater.com>:
> on 1-29-2009 8:30 AM Anne Wilson spake the following:
>> 2009/1/29 Alex H. Vandenham <alex-qMVNeVs1MAKw5LPnMra/2Q at public.gmane.org>:
>>> On Thursday 29 January 2009 10:15:38 am Anne Wilson wrote:
>>>> I assume that the hdd is failing - but I haven't seen any messages
>>>> from smartmontools.  Is there any way I can check that?  If it is I
>>>> don't want to waste time trying to repair it.
>>> try smartctl to see what the monitors have been finding for you.
>>>
>>> man smartctl
>>>
>> Thanks.  I'd been trying to remember what command I needed for that :-)
>>
>> The short test has completed without errors.  I'll run the long test
>> during dinner.  Assuming that that also runs without errors, I guess
>> that the next thing is memtest?
>>
>> More suggestions?
>>
>> Thanks
>>
>> Anne
> If you had many power failures, the filesystem might just be severely trashed.
> Journals and files out of sync, etc... If a good fsck didn't fix it, you might
> just be in for a wipe-reinstall, or many hours of finding and fixing corrupted
> files.. I would install to a new drive, and then you can take some time
> recovering from the old drive as you find things missing. That way you will
> still have the old system for whatever might come up. I always seem to find
> something that didn't get backed up properly.
>
Two days ago I discovered that the failures had indeed totally trashed
the system.  I did re-install, formatting only / and /boot, but I've
had a couple of these spontaneous shutdowns since then, which is why I
suspected hardware failure.

I've got copies of just about everything, I think, on an external
drive, and I could try another drive as you suggest, mounting the old
one in an external case, which I have.  I can cope with this, but I'm
deeply unhappy about not knowing what happened, and whether it is
likely to happen again.

Anne