[CentOS] Cluster Broken pipe Error & node Reboot

Fri Nov 7 10:46:07 UTC 2008
lingu <hicheerup at gmail.com>

Hi  all,

 I am running two node RHEL3U8  cluster of below cluster version on
HP servers connected  via scsi channel to HP Storage (SAN) for oracle
database server.

Kernel & Cluster Version

Kernel-2.4.21-47.EL #1 SMP
redhat-config-cluster-1.0.7-1-noarch
clumanager-1.2.26.1-1-x86_64


 Suddenly  my active node got rebooted after analysed the logs it is
throwing below errors on syslog.I want to know what might cause this
type of error and also after analysed the sar output indicates there
was no load on the server at the time system get rebooted as well as
on the time i am getting I/O Hang error.

Nov  3 14:23:00 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162:
Broken pipe
Nov  3 14:23:00 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  3 14:23:06 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162:
Broken pipe
Nov  3 14:23:06 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  3 14:23:13 cluster1 cluquorumd[1921]: <warning> Disk-TB: Detected
I/O Hang!
Nov  3 14:23:15 cluster1 clulockd[1996]: <warning> Denied 20.1.2.161:
Broken pipe
Nov  3 14:23:15 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  3 14:23:12 cluster1 clusvcmgrd[2011]: <err> Unable to obtain
cluster lock: Connection timed out

Nov  5 17:18:00 cluster1 cluquorumd[1921]: <warning> Disk-TB: Detected
I/O Hang!
Nov  5 17:18:00 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162:
Broken pipe
Nov  5 17:18:00 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  5 17:18:17 cluster1 clulockd[1996]: <warning> Denied 20.1.2.162:
Broken pipe
Nov  5 17:18:17 cluster1 clulockd[1996]: <err> select error: Broken pipe
Nov  5 17:18:17 cluster1 clulockd[1996]: <warning> Potential recursive
lock #0 grant to member
 #1, PID1962


 I need some one help  in guiding how to fix out this error and also
the real cause for such above  errors .

Attached my cluster.xml file.



<?xml version="1.0"?>
<cluconfig version="3.0">
 <clumembd broadcast="yes" interval="1000000" loglevel="5"
multicast="no" multicast_ipaddress="" thread="yes" tko_count="25"/>
 <cluquorumd loglevel="7" pinginterval="5" tiebreaker_ip=""/>
 <clurmtabd loglevel="7" pollinterval="4"/>
 <clusvcmgrd loglevel="7"/>
 <clulockd loglevel="7"/>
 <cluster config_viewnumber="4"
key="6672bc0a71be2ec9486f6a2f5846c172" name="ORACLECLUSTER"/>
 <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1"
rawshadow="/dev/raw/raw2" type="raw"/>
 <members>
  <member id="0" name="cluster1" watchdog="yes"/>
  <member id="1" name="cluster2" watchdog="yes"/>
 </members>
 <services>
  <service checkinterval="10" failoverdomain="oracle_db" id="0"
maxfalsestarts="0" maxrestarts="0" name="database"
userscript="/etc/init.d/script_db.sh">
    <service_ipaddresses>
      <service_ipaddress broadcast="None" id="0"
ipaddress="20.1.2.35" monitor_link="1" netmask="255.255.0.0"/>
    </service_ipaddresses>
     <device id="0" name="/dev/cciss/c0d0p1" sharename="">
      <mount forceunmount="yes" fstype="ext3" mountpoint="/vol1"
options="rw"/>
    </device>
    <device id="1" name="/dev/cciss/c0d0p2" sharename="">
      <mount forceunmount="yes" fstype="ext3" mountpoint="/vol2"
options="rw"/>
    </device>
    <device id="2" name="/dev/cciss/c0d0p5" sharename="">
      <mount forceunmount="yes" fstype="ext3" mountpoint="/vol3"
options="rw"/>
    </device>

 </service>
 </services>
 <failoverdomains>
  <failoverdomain id="0" name="oracle_db" ordered="no" restricted="yes">
    <failoverdomainnode id="0" name="cluster1"/>
    <failoverdomainnode id="1" name="cluster2"/>
  </failoverdomain>
 </failoverdomains>
</cluconfig>

Regards,
Lingu