On Wed, Jul 15, 2009 at 1:16 PM, Marko Vojinovicvvmarko@gmail.com wrote:
I have a machine here that resets itself every one hour (without my intention, of course):
Ok, there has been some development of the situation. I asked a collegue of mine (who happens to have physical access to the machine) to shut it down and get into the BIOS to check for the "wake on lan" status. He did (it was off) and he left the machine hanging in BIOS for an hour waiting to see if it would restart. It didn't, for two hours. He was also monitoring temerature and voltage and comparing them with two other machines in the room (identical hardware) and nothing seemed out of ordinary.
Then he rebooted to CentOS, and since then restarts have not happened for a whole day now:
# uptime 16:26:25 up 22:11, 1 user, load average: 0.00, 0.00, 0.00
That means that (a) the problem went away and (b) I am not able to reproduce it. From the logs I read that the resets were happening every 3625 +- 10 seconds, so to speak.
The idea of intentionally skewing the system clock and wait for a reaction was my next step, but since it stopped restarting I cannot verify anything now.
Since the problem went away, I am moderately happy, so we can drop the thread, and I'll make sure to reopen it if I see it restarting again.
Thanks to all for suggestions and help!
Best, :-) Marko