On Tue, 2008-08-12 at 14:27 +0200, Johan Swensson wrote:
So I'm running nfs to get content to my web servers. Now I've had this problem 2 times (about 2 weeks since the last occurrence). I use drbd on the nfs server for redundancy. Now to my problem:
All my web sites stopped responding so I started by checking dmesg and there I found a bunch of this errors Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out
But when checking the nfs server lockd was running and I could access all the files from the webserver with ls, cd etc.
The logs on the nfs server doesn't say anything of interest and checking apaches error_log just says "not found or unable to stat".
Now I mentioned this have happened 2 times and both these times I've "solved" it by rebooting the nfs server and web servers. This isn't a good solution to have to reboot my servers every couple of weeks so I really could use some help. :)
Also I get this from time to time on the web servers, dunno if it's related. do_vfs_lock: VFS is out of sync with lock manager!
---- I too have been having the same issues with my nfs server - which seems to have started when I updated on July 27th (5.2)
It seems to happen after logrotate on Sunday morning but I didn't know about it until users show up on Monday mornings.
/var/log/messages has...
Aug 4 09:32:59 cube kernel: lockd: server HOSTNAME not responding, still trying
and like you, I've rebooted the main server each time (Monday mornings)...there's something wrong that I can't figure out
Craig