[CentOS] RHCS on CentOS4 - 2 node cluster problem

Mon Jan 15 13:30:44 UTC 2007
Alexander Dalloz <ad+lists at uni-x.org>

Hello fellows,

I have a problem with a 2 node RHCS cluster (CentOS 4) where node 1 
failed and node 2 became active. That happened already last year and due 
to holidays the customer didn't recognize it. The cluster is just a 
failover for Apache and has no shared storage space.

Customer now saw the situation, tried to fix it by rebooting node 1, 
which then failed to come back up. As service ccsd started but couldn't 
get full cluster information the followup service cman hangs forever - 
bootup hangs in this state. Omitting cluster service starts at boot time 
by being selective (boot parameter confirm) brings up the box.

ccsd starts up (by service or by hand and with parameter -n), but 
syslogs that it fails to get cluster infrastructure information. So the 
cluster is in inquorate state. Anyone experienced with the RHCS knows 
whether I can avoid switching down node 2 and the Apache service for 
which the cluster runs? Documentation (manual and FAQ) is silent about 
this. I verified that there is no network / NIC problem. How to get the 
2 node cluster back into quorate state?

Thanks for helping.

Cheers

Alexander