Hi all,
I am running two node RHEL3U8 cluster of below cluster version on HP servers connected via scsi channel to HP Storage (SAN) for oracle database server.
Kernel & Cluster Version
Kernel-2.4.21-47.EL #1 SMP redhat-config-cluster-1.0.7-1-noarch clumanager-1.2.26.1-1-x86_64
Suddenly my active node got rebooted after analysed the logs it is throwing below errors on syslog.I want to know what might cause this type of error and also after analysed the sar output indicates there was no load on the server at the time system get rebooted as well as on the time i am getting I/O Hang error.
Nov 3 14:23:00 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162: Broken pipe Nov 3 14:23:00 cluster1 clulockd[1996]: <err> select error: Broken pipe Nov 3 14:23:06 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162: Broken pipe Nov 3 14:23:06 cluster1 clulockd[1996]: <err> select error: Broken pipe Nov 3 14:23:13 cluster1 cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang! Nov 3 14:23:15 cluster1 clulockd[1996]: <warning> Denied 20.1.2.161: Broken pipe Nov 3 14:23:15 cluster1 clulockd[1996]: <err> select error: Broken pipe Nov 3 14:23:12 cluster1 clusvcmgrd[2011]: <err> Unable to obtain cluster lock: Connection timed out
Nov 5 17:18:00 cluster1 cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang! Nov 5 17:18:00 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162: Broken pipe Nov 5 17:18:00 cluster1 clulockd[1996]: <err> select error: Broken pipe Nov 5 17:18:17 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162: Broken pipe Nov 5 17:18:17 cluster1 clulockd[1996]: <err> select error: Broken pipe Nov 5 17:18:17 cluster1 clulockd[1996]: <warning> Potential recursive lock #0 grant to member #1, PID1962
I need some one help in guiding how to fix out this error and also the real cause for such above errors .
Attached my cluster.xml file.
<?xml version="1.0"?> <cluconfig version="3.0"> <clumembd broadcast="yes" interval="1000000" loglevel="5" multicast="no" multicast_ipaddress="" thread="yes" tko_count="25"/> <cluquorumd loglevel="7" pinginterval="5" tiebreaker_ip=""/> <clurmtabd loglevel="7" pollinterval="4"/> <clusvcmgrd loglevel="7"/> <clulockd loglevel="7"/> <cluster config_viewnumber="4" key="6672bc0a71be2ec9486f6a2f5846c172" name="ORACLECLUSTER"/> <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1" rawshadow="/dev/raw/raw2" type="raw"/> <members> <member id="0" name="cluster1" watchdog="yes"/> <member id="1" name="cluster2" watchdog="yes"/> </members> <services> <service checkinterval="10" failoverdomain="oracle_db" id="0" maxfalsestarts="0" maxrestarts="0" name="database" userscript="/etc/init.d/script_db.sh"> <service_ipaddresses> <service_ipaddress broadcast="None" id="0" ipaddress="20.1.2.35" monitor_link="1" netmask="255.255.0.0"/> </service_ipaddresses> <device id="0" name="/dev/cciss/c0d0p1" sharename=""> <mount forceunmount="yes" fstype="ext3" mountpoint="/vol1" options="rw"/> </device> <device id="1" name="/dev/cciss/c0d0p2" sharename=""> <mount forceunmount="yes" fstype="ext3" mountpoint="/vol2" options="rw"/> </device> <device id="2" name="/dev/cciss/c0d0p5" sharename=""> <mount forceunmount="yes" fstype="ext3" mountpoint="/vol3" options="rw"/> </device>
</service> </services> <failoverdomains> <failoverdomain id="0" name="oracle_db" ordered="no" restricted="yes"> <failoverdomainnode id="0" name="cluster1"/> <failoverdomainnode id="1" name="cluster2"/> </failoverdomain> </failoverdomains> </cluconfig>
Regards, Lingu