Cluster Broken Pipe error and Heartbeat configuration - Discuss

12 Nov 2008


      Hi,
I am running two node active/passive cluster on RHEL3U8-64 bit
operating system for my oracle database,both the nodes are connected
to HP MSA-500 storage(scsi not Fibre channel) . Below are my hardware
and clumanager version details. It was running fine and stable for
last two years but all of a sudden for the past one month i am getting
below errors on syslog  and cluster restarting locally.
Server Hardware: HP ProLiant DL580 G4
OS: RHEL3U8-64BIT INTEL EMT
Kernel : 2.4.21-47.EL
Storage : HP MSA-500 storage (scsci channel)
Cluster Version:
clumanager-1.2.26.1-1
redhat-config-cluster-1.0.7-1
NODE1 ip: 20.2.135.161 (network bonding configured)
NODE2 ip: 20.2.135.162 (network bonding configured)
VIP : 20.2.135.35
Syslog errors
cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang!
clulockd[1996]: <warning> Potential recursive lock #0 grant to member
#1, PID1962
clulockd[1996]: <warning> Denied 20.1.135.162: Broken pipe
clulockd[1996]: <err> select error: Broken pipe
clulockd[1996]: <warning> Denied 20.1.135.162: Broken pipe
clulockd[1996]: <err> select error: Broken pipe
cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang!
clulockd[1996]: <warning> Denied 20.1.135.161: Broken pipe
clulockd[1996]: <err> select error: Broken pipe
clusvcmgrd[2011]: <err> Unable to obtain cluster lock: Connection timed out
cluquorumd[2100]: <err> VF: Abort: Invalid header in reply from member #0
cluquorumd[1934]: <err> __msg_send: Incomplete write to 13. Error:
Connection reset by peer
Can any one guide me  what is this above error indicates and how to
troubleshoot.After a long google search i found the below link from
redhat that is matching my scenario.Can i follow the same because it
is my very critical production server.
https://bugzilla.redhat.com/show_bug.cgi?id=185484
Also  anyone help me to configure a dedicated LAN (for example eth3)
as heartbeat(private  point to point cross over cable network for
cluster communications),I don't wish heartbeat over public LAN ,
because of heavy Network saturation.
Fot the above heartbeat configuration  i didnot found any suitable
document for rhel. Can any one provide me the suitable link or guide
me what are all the changes i have to made in my  existing cluster.xml
 file for this private heartbeat configuration to work.
Waiting for some one reply its urgent for me
Regards,
Lingu