Yesterday i was copying a few dump files (backups) about 1GB in size from my centOS 3.4, on a samba share, to my windows, when the centOS box stopped responding.
The HDD LED was on and after I connected a monitor all I could see was this error message over and over again:
"usb-uhci.c: host controller halted, trying to restart"
I wasn't able to login or anything, so I saw no other solution than pressing the reset button.
When it rebooted it forced a hdd check, but was unable to mount /dev/hde2 (that useually mounts on /var !) and pretty much nothing works without /var. The error message mount gave was something like "Unable to mount /dev/hde2: invalid argument".
The files I was copying are located on /dev/md0 (raid 0 over 4 disks) and that still worked fine, / is mounted on /dev/hde1 and that also worked as expected.
I removed /var from /etc/fstab and restored a backup of /var to the dir /var and then I could boot normally again.
Trying to save what was on /dev/hdde2 I ran a e2fsck -p /dev/hde2 and that corrected a ton of errors (deleted a lot of data), I then mounted /dev/hde2 to another folder and restored my backups so I only lost a few hours of data. After I added /var to /etc/fstab again everything worked as normal again.
My question is happened!? and what can I do to avoid this again? If /dev/hde2 had been a RAID 1 would it then have rebuild? Should I move /var to a RAID 1?
I have copied large files like this before without problems, was it just bad luck or should I expect it to do this again?
/dev/hde is attached to a cheap ide ultra ata 133 pci controller card (Silicon Image) that has worked flawlessly for about a year. Can that be broken? Right now it seems OK again. I have a replacement for it but I would rather not replace it if it isnt necessary.
Any suggestions are appreciated. Best regards Ulrik
PS. Try backups! You wont regret it :)