[CentOS] [DRBD-user] Unexplained reboots in DRBD82 + OCFS2 setup

Tue Jun 30 08:11:40 UTC 2009
Kris Buytaert <mlkb at inuits.be>

On Thu, 2009-06-25 at 11:42 +0200, Kris Buytaert wrote:

> > Use a serial console, attach that to some "monitoring" host.
> > (you can useUSB-to-Serial, they are cheap and work), and log
> > on that one. You'll get the last messages from there.
> > 
> I indeed had hoped to see some output on on the serial console when the
> reboots happened .. but the best I got so far was a partial timestamp
> with no further explanation before the reboot output started again .. 
> 
> Any other ideas ? 
> 

Update : 

The problem is indeed ocfs2 fencing off the systems , the logging
however does not show up in a serial console  it DOES show up when using
netconsole 


[base-root at CCMT-A ~]# nc -l -u -p 6666
(8,0):o2hb_write_timeout:166 ERROR: Heartbeat write timeout to device
drbd0 after 478000 milliseconds
(8,0):o2hb_stop_all_regions:1873 ERROR: stopping heartbeat on all active
regions.
ocfs2 is very sorry to be fencing this system by restarting
,

One'd think that it output over Serial console before it log over the
network :)   It doesn't . 




Next step is that I`ll start fiddling some more with the timeout
values :)