Nataraj wrote:
On 03/22/2012 08:24 AM, m.roth@5-cent.us wrote:
mark wrote:
On 03/21/12 19:50, Adam Wead wrote:
On Wed, Mar 21, 2012 at 4:40 PM,m.roth@5-cent.us wrote:
I just updated one of our servers to 5.8, and rebooted. In the logs, I saw a bunch of Mar 21 16:29:02<server> rpc.statd[9783]: recv_rply: can't decode RPC message! Mar 21 16:29:33<server> last message repeated 442 times
<snip>
I tried restarting nfslock, and that *appears* to have fixed it. Googling, I found a thread about that at http://nerdbynature.de/s9y/archives/2009/08.html, which suggests that it's starting too early, possibly before portmap is running.
Anyone else see this? Has an old bug snuck back in?
<snip> -
this is a backup server, and only had a home directory mounted when I ssh'd in. It does appear to have been the case suggested in the thread I've mentioned - there's no entry in the logfile after I restarted nfslock.
I run into these startup timing issues all the time on many linux distributions. Upstart was supposed to be an attempt to address these issues in Redhat/CentOS 6, but the hybrid startup process that has resulted from a partial transition to upstart is both confusing and sometimes makes the problem worse. I suspect the timing issues are related also to the speed and number of processors on your system.
I've solved these problems in several different ways:
For CentOS 5, if you don't mind changing the number on the init script, you can cause it to start later in the startup process. Sometimes this isn't enough. In some cases I've solved the problem by creating my own init script which has a sleep command in it and then either starts or restarts the selected component after a fixed time delay. Note that the init script must fire up a shell that runs in the background and then runs the restart command after the specified time. Maybe not so elegant, but it works.
In this case, a more elegant solution would be one that the authors of the initscript should have thought of: they're already checking to see if something's running, why not loop with a sleep until portmap's running?
mark