On Thu, 19 Apr 2012, Giovanni Tirloni wrote:
Did you run this command during "the hang" or is it constantly returning you that?
It is returning the time out only during the hang; the rest of the time it works normally.
If the later, are you blocking UDP on either the server or the client?
No blocking.
If you don't specify transport protocol, rpcinfo will use whatever is defined in the /etc/netconfig database and that's usually UDP.
Using UDP or TCP makes no difference. "rpcinfo -{u,t} host nfs" both give a timeout during the hang, and work normally during other times.
- Is it happening at the exact same minute (eg. 2:15, 2:45, 3:15, 3:45).
This might help you to identify a script/program that follows that schedule.
It is not related to any script that I can find. It is not happening at _exactly_ the same time all the time, although it is similar within a few minutes.
- Is there any configuration different between this server and the others?
/etc/system, root crontab, etc.
No differences that I can find.
- When you say everything else BUT NFS is working fine, are pings answered
properly without increased latency during "the hang" ?
Yes. I can even run an iperf server on the host during the hang, and from a client I run iperf -c and get normal performance.
- What about other services? Can you set up a monitoring script connecting
to some other service (eg. ftp, ls, exit or ssh) and reporting the total run time?
No other service appears to be impacted at all.
- Can you set up a monitoring script running "rpcinfo" on localhost to make
sure both local and remote communications hang?
Yes, can do.
-Steve