Yesterday i was copying a few dump files (backups) about 1GB in size from my centOS
3.4, on a samba share, to my windows, when the centOS box stopped responding.
The HDD LED was on and after I connected a monitor all I could see was this error
message over and over again:
"usb-uhci.c: host controller halted, trying to restart"
I wasn't able to login or anything, so I saw no other solution than pressing the
reset button.
When it rebooted it forced a hdd check, but was unable to mount /dev/hde2 (that
useually mounts on /var !) and pretty much nothing works without /var.
The error message mount gave was something like "Unable to mount /dev/hde2: invalid
argument".
The files I was copying are located on /dev/md0 (raid 0 over 4 disks) and that still
worked fine, / is mounted on /dev/hde1 and that also worked as expected.
I removed /var from /etc/fstab and restored a backup of /var to the dir /var and
then I could boot normally again.
Trying to save what was on /dev/hdde2 I ran a e2fsck -p /dev/hde2 and that corrected
a ton of errors (deleted a lot of data), I then mounted /dev/hde2 to another folder
and restored my backups so I only lost a few hours of data. After I added /var to
/etc/fstab again everything worked as normal again.
My question is happened!? and what can I do to avoid this again?
If /dev/hde2 had been a RAID 1 would it then have rebuild? Should I move /var to a
RAID 1?
I have copied large files like this before without problems, was it just bad luck or
should I expect it to do this again?
/dev/hde is attached to a cheap ide ultra ata 133 pci controller card (Silicon
Image) that has worked flawlessly for about a year. Can that be broken? Right now it
seems OK again. I have a replacement for it but I would rather not replace it if it
isnt necessary.
Any suggestions are appreciated.
Best regards
Ulrik
PS. Try backups! You wont regret it :)
--
This message has been scanned for viruses and
dangerous content by OpenProtect(http://www.openprotect.com), and is
believed to be clean.