[CentOS] cluster suite & gfs problem since update

Thu Jan 3 17:25:20 UTC 2008
Doug Tucker <tuckerd at engr.smu.edu>

I have a cluster that has been operational for some time and functioning
flawlessly until a recent yum update.  The last unflawed working kernel
was 2.6.9-55.0.12.ELsmp.  The current kernel is 2.6.9-67ELsmp.  The
problem appears to be some type of infinite recovery loop of sorts.  It
runs find for a few minutes, then the service restarts itself.  What I
am seeing in /var/log/messages is:

Jan  3 11:17:47 engrfs1 clurgmgrd: [5614]: <err> nfsclient:skynet_disted
is missing! 
Jan  3 11:17:47 engrfs1 clurgmgrd[5614]: <notice> status on
nfsclient:skynet_disted returned 1 
(generic error) 
Jan  3 11:17:47 engrfs1 bash: [27695]: <info> Removing export:
129.119.113.108:/mnt/disted 
Jan  3 11:17:47 engrfs1 bash: [27695]: <info> Adding export:
129.119.113.108:/mnt/disted (rw) 


It does this for every client definition on the service.  After it gets
to the last one, it then restarts the serivce:

Jan  3 11:16:25 engrfs1 clurgmgrd[5614]: <notice> Stopping service
disted_export 
Jan  3 11:16:26 engrfs1 clurgmgrd: [5614]: <info> Removing IPv4 address
129.119.113.180 from et
h0 
Jan  3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Service disted_export
is recovering 
Jan  3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Recovering failed
service disted_export 

Then adds the exports and starts services again:

Jan  3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding export:
129.119.113.108:/mnt/disted (r
w) 
Jan  3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding IPv4 address
129.119.113.180 to eth0 
Jan  3 11:16:37 engrfs1 clurgmgrd[5614]: <notice> Service disted_export
started 

And then starts over at the beginning again continuously.  This is a
production system and this behaviour is causing the clients to hang (of
course) during the restart.  Thanks much for your help!

Sincerely,

Doug Tucker