Just FYI, I figured out the problem. I had set all of the clients up with their IP address in the "target" field, but apparently the updated rgmanager nfsclient.sh script now checks /var/lib/nfs/etab and sees what's in there and does a compare, and etab always has the *hostname* instead of the ip, so since it didn't match the script was marking it bad. Kinda stupid way of monitoring if you ask me, why they felt like this was necessary I have no idea. Just wanted to let anyone know that may have set their clients up by ip address that the new update is going to break them.
On Thu, 2008-01-03 at 11:25 -0600, Doug Tucker wrote:
I have a cluster that has been operational for some time and functioning flawlessly until a recent yum update. The last unflawed working kernel was 2.6.9-55.0.12.ELsmp. The current kernel is 2.6.9-67ELsmp. The problem appears to be some type of infinite recovery loop of sorts. It runs find for a few minutes, then the service restarts itself. What I am seeing in /var/log/messages is:
Jan 3 11:17:47 engrfs1 clurgmgrd: [5614]: <err> nfsclient:skynet_disted is missing! Jan 3 11:17:47 engrfs1 clurgmgrd[5614]: <notice> status on nfsclient:skynet_disted returned 1 (generic error) Jan 3 11:17:47 engrfs1 bash: [27695]: <info> Removing export: 129.119.113.108:/mnt/disted Jan 3 11:17:47 engrfs1 bash: [27695]: <info> Adding export: 129.119.113.108:/mnt/disted (rw)
It does this for every client definition on the service. After it gets to the last one, it then restarts the serivce:
Jan 3 11:16:25 engrfs1 clurgmgrd[5614]: <notice> Stopping service disted_export Jan 3 11:16:26 engrfs1 clurgmgrd: [5614]: <info> Removing IPv4 address 129.119.113.180 from et h0 Jan 3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Service disted_export is recovering Jan 3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Recovering failed service disted_export
Then adds the exports and starts services again:
Jan 3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding export: 129.119.113.108:/mnt/disted (r w) Jan 3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding IPv4 address 129.119.113.180 to eth0 Jan 3 11:16:37 engrfs1 clurgmgrd[5614]: <notice> Service disted_export started
And then starts over at the beginning again continuously. This is a production system and this behaviour is causing the clients to hang (of course) during the restart. Thanks much for your help!
Sincerely,
Doug Tucker
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos