[CentOS] Cluster fail over database getting stopped

Sun Nov 23 15:01:37 UTC 2008
linux-crazy <hicheerup at gmail.com>

Hi,

  I am running RHEL3u8  two node cluster,which is running oracle 9i
database.I am facing problem while rebooting second node causing my
oracle database get stopped in the active node 1 which is running my
database.so i checked below probabilities to find out when the
database get stopped.

Version
clumanager-1.2.31-1.x86_64.rpm

I stopped both the node.

started first node

when the clumanager started during boot  cycle on node 2  database
running on  node 1 get stopped(checked my oracle alert log  telling
the database get stopped exactly at the same time clumanager started
on node 2)

After that When i run clustat on node 1 its telling the
service(database) is running.

I am using /etc/init.d/scriptdb.sh in my cluster config file which is
having both start.stop and status check.

test 2:

I stopped both the node and started the node 2 first and waited for 30 minutes.

oracle was up and running by default on node 2  (clumanger started
oracle service)

started  node 1 after 20 minutes

when the clumanager started during boot cycle on node 1  database
running on  node 2 get stopped(checked my oracle alert log  telling
the database get stopped exactly at the same time clumanager started
on node 1)

After that When i run clustat on node 2 its telling the
service(database) is running.

Test2:

 If cluster  relocate the service automatically by itself form node 1
to node 2 or node 2 to node 1 for some reason during the critical day
time my database is not getting up during fail over on both the nodes.

Test 3:

If i manually  relocate the service from node 1 to node 2 and vice
versa my database is not getting stopped and it is working fine.

 Please some one help me to fix out this issue ,it is my critical
production database.

Below is my cluster config file

cluster.xml

<?xml version="1.0"?>
<cluconfig version="3.0">
  <clumembd broadcast="yes" interval="1000000" loglevel="5"
multicast="no" multicast_ipaddress="" thread="yes" tko_count="25"/>
  <cluquorumd loglevel="7" pinginterval="5" tiebreaker_ip=""/>
  <clurmtabd loglevel="7" pollinterval="4"/>
  <clusvcmgrd loglevel="7"/>
  <clulockd loglevel="7"/>
  <cluster config_viewnumber="4"
key="6672bc0a71be2ec9486f6a2f5846c172" name="DBCLUSTER"/>
  <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1"
rawshadow="/dev/raw/raw2" type="raw"/>
  <members>
    <member id="0" name="cluster1" watchdog="yes"/>
    <member id="1" name="cluster2" watchdog="yes"/>
  </members>
  <services>
    <service checkinterval="10" failoverdomain="oracle_db" id="0"
maxfalsestarts="0" maxrestarts="0" name="database"
userscript="/etc/init.d/script_db.sh">
      <service_ipaddresses>
        <service_ipaddress broadcast="None" id="0"
ipaddress="20.2.135.35" monitor_link="1" netmask="255.255.0.0"/>
      </service_ipaddresses>
      <device id="0" name="/dev/cciss/c0d0p1" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol1"
options="rw"/>
      </device>
      <device id="1" name="/dev/cciss/c0d0p2" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol2"
options="rw"/>
      </device>
      <device id="2" name="/dev/cciss/c0d0p5" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol3"
options="rw"/>
      </device>
      <device id="3" name="/dev/cciss/c0d0p6" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol4"
options="rw"/>
      </device>
      <device id="4" name="/dev/cciss/c0d0p7" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol5"
options="rw"/>
      </device>
      <device id="5" name="/dev/cciss/c0d0p8" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol6"
options="rw"/>
      </device>
      <device id="6" name="/dev/cciss/c0d0p9" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol7"
options="rw"/>
      </device>
      <device id="7" name="/dev/cciss/c0d0p10" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol8"
options="rw"/>
      </device>
    </service>
  </services>
  <failoverdomains>
    <failoverdomain id="0" name="oracle_db" ordered="no" restricted="yes">
      <failoverdomainnode id="0" name="cluster1"/>
      <failoverdomainnode id="1" name="cluster2"/>
    </failoverdomain>
  </failoverdomains>
</cluconfig>



Regards,
crazy pap