Hi we use cent OS 5.2 on all our production server, just today one of or mail server had a reboot and we dont understand why ... usually we get a alert if a system was loaded up ... but this system just reboot .... may be it could be a power failure ...( is there any way to check this ) .. but that is the last optiopn i am guess as the server is in IDC . With dmidecode and biosdecode .. i just get server / bios info . In the Messages and Dmesgs ... i cant really pin pont the issue . Is there any thingi am not doing or any thing i am supposed to figure out the reson for this reboot.
Agnello George wrote:
that is the last optiopn i am guess as the server is in IDC . With dmidecode and biosdecode .. i just get server / bios info . In the Messages and Dmesgs ... i cant really pin pont the issue . Is there any thingi am not doing or any thing i am supposed to figure out the reson for this reboot.
Does the server have a system event log? Many of the better servers will log things like memory errors, or power outages etc. But not all. This kind of issue can be hard to diagnose but is often hardware related assuming you can rule out a power failure, if it's only 1 box and it's on a shared PDU then likely the power didn't go out.
I had a server in a remote data center, a somewhat brand new Dell R610 which is their latest and greatest "enterprise" box, that would spontaneously reboot a couple times a week. Couldn't find the cause, convinced them to replace the system board/ram/cpu, but that didn't help either. So we shipped the box back to our main facility and it has been running w/o fault for the past several weeks doing the same thing it was doing at the other site.
Maybe it just needed to be banged up a bit more.
Have another Dell R610 at another data center that just likes to flat out die. Ran fine for a few months, then just started to go off the network and the management card goes with it frequently, probably another system board issue though nothing in the logs..
Had some HP boxes not long ago with bad ram and/or bad boards, and they would generate a bunch of memory errors, and whenever the board encountered that kind of uncorrectable error it rebooted the box. Couldn't even run the HP diagnostics on them without the box hanging or rebooting when testing the memory. But the system event log did log each time that there was a memory problem, so it was easy to pinpoint what the issue was. It was unusual though that it impacted 3 different systems(all brand new). When we first got them one of the things we did was upgrade the BIOS, and that bricked all 3 the first time around, then they switched the boards and we opted not to upgrade the bios, that's when we realized the memory errors. So had HP replace all of them and the new ones work much better.
nate