[CentOS] cluster suite & gfs problem since update

Doug Tucker

tuckerd at engr.smu.edu
Thu Jan 3 17:25:20 UTC 2008


I have a cluster that has been operational for some time and functioning
flawlessly until a recent yum update.  The last unflawed working kernel
was 2.6.9-55.0.12.ELsmp.  The current kernel is 2.6.9-67ELsmp.  The
problem appears to be some type of infinite recovery loop of sorts.  It
runs find for a few minutes, then the service restarts itself.  What I
am seeing in /var/log/messages is:

Jan  3 11:17:47 engrfs1 clurgmgrd: [5614]: <err> nfsclient:skynet_disted
is missing! 
Jan  3 11:17:47 engrfs1 clurgmgrd[5614]: <notice> status on
nfsclient:skynet_disted returned 1 
(generic error) 
Jan  3 11:17:47 engrfs1 bash: [27695]: <info> Removing export:
129.119.113.108:/mnt/disted 
Jan  3 11:17:47 engrfs1 bash: [27695]: <info> Adding export:
129.119.113.108:/mnt/disted (rw) 


It does this for every client definition on the service.  After it gets
to the last one, it then restarts the serivce:

Jan  3 11:16:25 engrfs1 clurgmgrd[5614]: <notice> Stopping service
disted_export 
Jan  3 11:16:26 engrfs1 clurgmgrd: [5614]: <info> Removing IPv4 address
129.119.113.180 from et
h0 
Jan  3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Service disted_export
is recovering 
Jan  3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Recovering failed
service disted_export 

Then adds the exports and starts services again:

Jan  3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding export:
129.119.113.108:/mnt/disted (r
w) 
Jan  3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding IPv4 address
129.119.113.180 to eth0 
Jan  3 11:16:37 engrfs1 clurgmgrd[5614]: <notice> Service disted_export
started 

And then starts over at the beginning again continuously.  This is a
production system and this behaviour is causing the clients to hang (of
course) during the restart.  Thanks much for your help!

Sincerely,

Doug Tucker







More information about the CentOS mailing list