Hello, I have 2 servers which share 2 partitions with drbd, on each machine runs one VM on the drbd device, so that I have primary/secondary and secondary/primary drbd devices. There are also some more XEN VM´s that only do aa mysql replication and one is standalone. In the last 4 weeks I had 2 incidents where both machines did a sudden reboot, first one machine and 2 minutes later the other one. I cant find anything in the logfiles, only that a few seconds before the secondary drbd device got a timeout (Aug 18 04:02:14 xen-A1 kernel: drbd1: PingAck did not arrive in time.) the cron.daily started:
Aug 18 04:02:01 xen-B1 crond[25033]: (root) CMD (run-parts /etc/cron.daily) Aug 18 04:02:01 xen-B1 anacron[25037]: Updated timestamp for job `cron.daily' to 2008-08-18 Aug 18 04:06:06 xen-B1 crond[2963]: (CRON) STARTUP (V5.0)
the last line seems to be the reboot. here the content of my cron.daily folder. [root@xen-B1.blab:/etc/cron.daily]# ls 0anacron 0logwatch cups logrotate makewhatis.cron mlocate.cron prelink rpm tmpwatch
I had also heartbeat running on the machines, after the last crash I thougt it was heartbeat that rebooted the machines, so I disabled it, but now I see it isnt heartbeat that causes these reboots. I use centos 5.1 with no selfcompiled packages.
Why are my machines doing this, and how to fix this?
greetings and thanks
Rupert