[Centos] crahsing app
Jonathan Dill
jfdill_2 at jfdill.com
Wed Jan 26 04:23:04 UTC 2005
Michael Best wrote:
> A reformat reinstall is nothing more than erasing the disk and
> starting over, give or take some log files.
>
> *) Erase all files associated with the application and reinstall, test
> *) swap space is being corrupted? turn off swap
> make swap with the check for bad blocks option, turn swap back on
> *) Run harddrive diagnostics perhaps the harddrives are going bad
> *) Run memtest+, if you can do this as an overnight burn in
> *) Perhaps Steam is the problem, who knows?
First off, I'd set up a way to recover the system from a time before it
stopped working correctly. Mirroring the files to another disk is
probably the fastest way, then you could just boot off KNOPPIX and rsync
the backup files over what's there. If you were running from within
VMware, you could just schedule snapshots and revert to a previous
snapshot of the system when it was working properly. Restoring from
tape is too slow and a pain to do, not something you want to get stuck
doing every couple weeks, much easier and faster to just rsync off an
extra disk or another computer.
If possible, I would repartition the system with two "root" partitions
called "/" and "/r2" and "/r2" would just be an rsync backup of "/".
You would set up LILO or GRUB entries to be able to boot off /r2, plus
you need to modify /r2/etc/fstab since the root partition is different
to "run" on that copy. Say for example you sync / -> /r2 daily, and the
day the app stops working, you just reboot the system into /r2 and the
system is probably working again, then sync /r2 -> /. I have used this
approach a lot so that I could install a newer OS on /r2 but leave the
old OS on / in case there was a problem getting any custom applications
to work I could just reboot to the old OS and it's no problem. Some
people have complained that it "wastes disk space" so lately I split the
OS into "/" and "/usr/share" then when I want to install a newer OS I
move /usr/share to a loop filesystem on /home for example and the
/usr/share partition becomes /r2. It's a bit more complicated but it works.
Offhand, I'd burn a stresslinux CD and try memtest86 and the appropriate
versions of cpuburn to see if anything weird happens:
http://www.stresslinux.org/
Then still running off the CD, I'd try the "mkswap -c" and "e2fsck -c
-f" on all of the filesystems, assuming you're using ext2 or ext3
filesystem--usual disclaimer to BACK UP your system and so on. Maybe
Steam is getting its data out of the "wrong place" w.r.t. the Linux
kernel like reading from fs buffers that may be out of sync as happened
with "dump" but that is just a wild guess because I deleted the original
message and have no idea what Steam is. What follows is just some
general advice to come up with a plan of attack and brainstorming ideas.
I'd avoid having any part of Steam rely on files on an NFS filesystem,
if that is your setup that would be my #1 suspect. If Steam uses any
databases, you might want to run any tools to optimize / fix the tables
periodically.
Try to break down the problem into "layers" analogous to the OSI layers
of what is processing the data where, and then check out the problem at
each "layer" just roughly something like:
application
libraries
database
kernel
filesystem
hardware
network
Those might not be in the "right" order but could give you some ideas.
The goal is to try to isolate and identify what "layer" the problem is
occurring at. I have solved problems where the cause of the problem was
unexpected, like ypbind losing its connection to the NIS server which
seemed like it shouldn't have affected the application but did for some
reason, like the application used certain system calls that ended up
tying into NIS.
--jonathan
More information about the CentOS
mailing list