On Thu, 2009-06-25 at 11:42 +0200, Kris Buytaert wrote:
Use a serial console, attach that to some "monitoring" host. (you can useUSB-to-Serial, they are cheap and work), and log on that one. You'll get the last messages from there.
I indeed had hoped to see some output on on the serial console when the reboots happened .. but the best I got so far was a partial timestamp with no further explanation before the reboot output started again ..
Any other ideas ?
Update :
The problem is indeed ocfs2 fencing off the systems , the logging however does not show up in a serial console it DOES show up when using netconsole
[base-root@CCMT-A ~]# nc -l -u -p 6666 (8,0):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device drbd0 after 478000 milliseconds (8,0):o2hb_stop_all_regions:1873 ERROR: stopping heartbeat on all active regions. ocfs2 is very sorry to be fencing this system by restarting ,
One'd think that it output over Serial console before it log over the network :) It doesn't .
Next step is that I`ll start fiddling some more with the timeout values :)