Cluster fail over database getting stopped - Discuss

23 Nov 2008


      Hi,
I am running RHEL3u8  two node cluster,which is running oracle 9i
database.I am facing problem while rebooting second node causing my
oracle database get stopped in the active node 1 which is running my
database.so i checked below probabilities to find out when the
database get stopped.
Version
clumanager-1.2.31-1.x86_64.rpm
I stopped both the node.
started first node
when the clumanager started during boot  cycle on node 2  database
running on  node 1 get stopped(checked my oracle alert log  telling
the database get stopped exactly at the same time clumanager started
on node 2)
After that When i run clustat on node 1 its telling the
service(database) is running.
I am using /etc/init.d/scriptdb.sh in my cluster config file which is
having both start.stop and status check.
test 2:
I stopped both the node and started the node 2 first and waited for 30 minutes.
oracle was up and running by default on node 2  (clumanger started
oracle service)
started  node 1 after 20 minutes
when the clumanager started during boot cycle on node 1  database
running on  node 2 get stopped(checked my oracle alert log  telling
the database get stopped exactly at the same time clumanager started
on node 1)
After that When i run clustat on node 2 its telling the
service(database) is running.
Test2:
If cluster  relocate the service automatically by itself form node 1
to node 2 or node 2 to node 1 for some reason during the critical day
time my database is not getting up during fail over on both the nodes.
Test 3:
If i manually  relocate the service from node 1 to node 2 and vice
versa my database is not getting stopped and it is working fine.
Please some one help me to fix out this issue ,it is my critical
production database.
Below is my cluster config file
cluster.xml
<?xml version="1.0"?>
<cluconfig version="3.0">
  <clumembd broadcast="yes" interval="1000000" loglevel="5"
multicast="no" multicast_ipaddress="" thread="yes" tko_count="25"/>
  <cluquorumd loglevel="7" pinginterval="5" tiebreaker_ip=""/>
  <clurmtabd loglevel="7" pollinterval="4"/>
  <clusvcmgrd loglevel="7"/>
  <clulockd loglevel="7"/>
  <cluster config_viewnumber="4"
key="6672bc0a71be2ec9486f6a2f5846c172" name="DBCLUSTER"/>
  <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1"
rawshadow="/dev/raw/raw2" type="raw"/>
  <members>
    <member id="0" name="cluster1" watchdog="yes"/>
    <member id="1" name="cluster2" watchdog="yes"/>
  </members>
  <services>
    <service checkinterval="10" failoverdomain="oracle_db" id="0"
maxfalsestarts="0" maxrestarts="0" name="database"
userscript="/etc/init.d/script_db.sh">
      <service_ipaddresses>
        <service_ipaddress broadcast="None" id="0"
ipaddress="20.2.135.35" monitor_link="1" netmask="255.255.0.0"/>
      </service_ipaddresses>
      <device id="0" name="/dev/cciss/c0d0p1" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol1"
options="rw"/>
      </device>
      <device id="1" name="/dev/cciss/c0d0p2" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol2"
options="rw"/>
      </device>
      <device id="2" name="/dev/cciss/c0d0p5" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol3"
options="rw"/>
      </device>
      <device id="3" name="/dev/cciss/c0d0p6" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol4"
options="rw"/>
      </device>
      <device id="4" name="/dev/cciss/c0d0p7" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol5"
options="rw"/>
      </device>
      <device id="5" name="/dev/cciss/c0d0p8" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol6"
options="rw"/>
      </device>
      <device id="6" name="/dev/cciss/c0d0p9" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol7"
options="rw"/>
      </device>
      <device id="7" name="/dev/cciss/c0d0p10" sharename="">
        <mount forceunmount="yes" fstype="ext3" mountpoint="/vol8"
options="rw"/>
      </device>
    </service>
  </services>
  <failoverdomains>
    <failoverdomain id="0" name="oracle_db" ordered="no" restricted="yes">
      <failoverdomainnode id="0" name="cluster1"/>
      <failoverdomainnode id="1" name="cluster2"/>
    </failoverdomain>
  </failoverdomains>
</cluconfig>
Regards,
crazy pap