Hi Everybody!
I've got a nasty problem with a fresh install of CenOS 4 (actually re-install, to rule out any configuration problems accumulated over time).
We have a c4-based machine (further Server), which is supposed to serve a lot of files over NFS. In the course of the setup I was checking how the backups (using rsync) would work and got a lot of errors reported by rsync about non-existent (sub) directories. When I went to check the reported directory on the Client, it was indeed not accessible, although it's name in the parent directory was visible. Later I've found that rsync is irrelevant, the problem could be reproduced by just running find. To illustrate (run on the client)
cd /mnt/Server/export find . > /dev/null ./a/b/c: No such file or directory ls -l ./a/b ls: /mnt/server/a/b/c: No such file or directory (then follows the listing of remaining files in ./a/b) If you fo just 'ls ./a/b', the output includes 'c' among the others.
NOW THE TRICK:
On the Server, do
ls /export/a/b/c (expected output follows)
Now on the client everything is OK!
I've found a similar bug here: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144556 although it is filed against fc3, the symptoms seem to be the same, although noone mentioned that server-side trick.
I've tried several client machines (rhl7.3, c3,c4), tcp/udp, v2/v3, sync/async and also set actimeo=0 - nothing seem to affect the behavior. And nothing is printed in the logs - neither on client nor server (granted, I don't have debug enabled).
At some point I suspected hardware (like lost or corrupted ethernet packets), but the missing directory on the client side is missing until the trick is applied on the server side, pretty consistently.
umount/mount may help with that particular dir, but the bug would show up in different place. And as per bug referenced above, I could confirm, that all the dirs reported missing have the size 12288 (haven't spotted bigger, but maybe because I haven't checked every one of them).
Just in case: the Server is Dell PowerEdge 2850 connected through Gigabit interface (e1000) (latest BIOS updates applied in the course of troubleshooting).
I did not try to re-install the server with c3, but even if it works, it does not seem to be an option, because we plan to use selinux, which is missing in c3 (and yes, I tried disabling selinux with setenforce=0).
Any suggestion would be greatly appreciated.
Have a great day, Alex