[CentOS] storage servers crashing, hair being pulled out!

Sun Dec 20 18:34:19 UTC 2009
Matty <matty91 at gmail.com>

On Sat, Dec 19, 2009 at 10:55 PM, Gordon McLellan <gordonthree at gmail.com> wrote:

> I have a trio of servers that like to reboot during high disk /
> network IO operations.  They don't appear to panic, as I have
> kernel.panic = 0 in sysctl.conf.  The syslog just shows normal
> messages, like samba complaining about browse master and then just
> syslogd starting up.

If the box is panicing under high load, you should definitely check
the memory / CPU / power supplies. You may also find it beneficial to
enable kdump, netdump and sysrq. If the box hangs, you can issue a
sysrq magic key sequence to force the box to panic. During the panic
process, you should get a core file that you can analyze to see what
is going on (crash has some useful options to dump thread stacks,
which you can use to search the LKML archives).

- Ryan
--
http://prefetch.net