Hi,
I have a relatively loaded CentOS5 server (64-bit, dual core) and a mixed
bag of Fedora 6/CentOS4, 32-, and 64-bit clients. NFSv3 works without
problem. References over NFSv4 hang occasionally, in particular on file
opening. I wonder whether there is anybody here who can help to trace it,
or can suggest a more appropriate forum?
The one hang that I have been able to trace involved only
CentOS5/2.6.18-8.1.4.el5. The server is 64-bit, client: 32-bit. Hang
happened when a program was about to be executed from NFSv4 share.
LD_LIBRARY_PATH included a directory on this share. gdb backtrace
revealed that the process was being loaded in memory, and an attempt to
open (non-existing) library file never completed.
/Pawel
strace -p 19289
Process 19289 attached - interrupt to quit
open("/pkg/pgi/5.2-4//linux86/5.2/lib/libg2c.so.0", O_RDONLY <unfinished
...>
gdb program.x 19289
0x0063cb04 in open () from /lib/ld-linux.so.2
(gdb) where
0 0x0063cb04 in open () from /lib/ld-linux.so.2
1 0x0062d6c5 in open_verify () from /lib/ld-linux.so.2
2 0x0062dc6a in open_path () from /lib/ld-linux.so.2
3 0x0063055f in _dl_map_object () from /lib/ld-linux.so.2
4 0x006340d6 in openaux () from /lib/ld-linux.so.2
5 0x00635b46 in _dl_catch_error () from /lib/ld-linux.so.2
6 0x0063469a in _dl_map_object_deps () from /lib/ld-linux.so.2
7 0x0062b40e in dl_main () from /lib/ld-linux.so.2
8 0x0063b8bb in _dl_sysdep_start () from /lib/ld-linux.so.2
9 0x006292b8 in _dl_start () from /lib/ld-linux.so.2
10 0x00628817 in _start () from /lib/ld-linux.so.2
The NFSv4 is automounted (direct mount):
cat /etc/auto.pkg
/pkg -fstype=nfs4 server:/i32
For what is worth, I tried NFSv4 with CentOS4 server but it was hopeless
(server would stop responding or panic). Older kernel releases of Fedora6
(2.6.19?) were hopeless too, with similar symptoms.
Anybody knows who may be interested in some detailed bug report, or help
debugging the problem?
Pawel
PS.
The server logs plenty of messages:
NFSD: setclientid: string in use by client(clientid 46604ac4/00000016)
but my rpc.idmapd configuration is correct as far as I can tell..