[CentOS] centos 5 reboot diagnosis

nate centos at linuxpowered.net
Fri Sep 25 12:24:48 UTC 2009


Agnello George wrote:
> that is the last optiopn i am guess as the server is in IDC .   With
> dmidecode and biosdecode .. i just get server / bios info . In the Messages
> and Dmesgs ... i cant really pin pont  the issue .  Is there any thingi am
> not doing or any thing i am supposed to  figure out the reson for this
> reboot.

Does the server have a system event log? Many of the better servers
will log things like memory errors, or power outages etc. But not
all. This kind of issue can be hard to diagnose but is often hardware
related assuming you can rule out a power failure, if it's only 1 box
and it's on a shared PDU then likely the power didn't go out.

I had a server in a remote data center, a somewhat brand new Dell R610
which is their latest and greatest "enterprise" box, that would
spontaneously reboot a couple times a week. Couldn't find the cause,
convinced them to replace the system board/ram/cpu, but that didn't
help either. So we shipped the box back to our main facility and it
has been running w/o fault for the past several weeks doing the same
thing it was doing at the other site.

Maybe it just needed to be banged up a bit more.

Have another Dell R610 at another data center that just likes to
flat out die. Ran fine for a few months, then just started to go
off the network and the management card goes with it frequently,
probably another system board issue though nothing in the logs..

Had some HP boxes not long ago with bad ram and/or bad boards, and
they would generate a bunch of memory errors, and whenever the
board encountered that kind of uncorrectable error it rebooted
the box. Couldn't even run the HP diagnostics on them without the box
hanging or rebooting when testing the memory. But the system event
log did log each time that there was a memory problem, so it was
easy to pinpoint what the issue was. It was unusual though that
it impacted 3 different systems(all brand new). When we first got
them one of the things we did was upgrade the BIOS, and that bricked
all 3 the first time around, then they switched the boards and we
opted not to upgrade the bios, that's when we realized the memory
errors. So had HP replace all of them and the new ones work much
better.

nate





More information about the CentOS mailing list