[CentOS] nfslock

Thu Mar 22 17:12:04 UTC 2012
Nataraj <incoming-centos at rjl.com>

On 03/22/2012 08:24 AM, m.roth at 5-cent.us wrote:
> mark wrote:
>> On 03/21/12 19:50, Adam Wead wrote:
>>> On Wed, Mar 21, 2012 at 4:40 PM,<m.roth at 5-cent.us>  wrote:
>>>> I just updated one of our servers to 5.8, and rebooted. In the logs, I
>>>> saw
>>>> a bunch of
>>>> Mar 21 16:29:02<server>  rpc.statd[9783]: recv_rply: can't decode RPC
>>>> message!
>>>> Mar 21 16:29:33<server>  last message repeated 442 times
>>>> Mar 21 16:30:34<server>  last message repeated 835 times
>>>> Mar 21 16:31:36<server>  last message repeated 884 times
>>>> Mar 21 16:32:38<server>  last message repeated 856 times
>>>> Mar 21 16:32:44<server>  last message repeated 111 times
>>>> I tried restarting nfslock, and that *appears* to have fixed it.
>>>> Googling, I found a thread about that at
>>>> <http://nerdbynature.de/s9y/archives/2009/08.html>, which suggests that
>>>> it's starting too early, possibly before portmap is running.
>>>> Anyone else see this? Has an old bug snuck back in?
>>  > There's a NFS bug with the latest kernel:
>>  >
>>  > https://bugzilla.redhat.com/show_bug.cgi?id=798809
>>  >
>>  > Reboot into your previous kernel and that should fix it.
>> Great - but I've just updated a server I've missed, that's been "we're
>> too busy to let you do it" until now, and it would take it back to 5.7,
>> at least. I suppose I can yum downgrade....
> Following myself up - I didn't look at the bugzilla link earlier - updated
> t-bird at home the other day, and the click link to open it in browser
> doesn't work - but looked at it here, and it doesn't seem to be related -
> this is a backup server, and only had a home directory mounted when I
> ssh'd in. It does appear to have been the case suggested in the thread
> I've mentioned - there's no entry in the logfile after I restarted
> nfslock.
>          mark
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos

I run into these startup timing issues all the time on many linux
distributions.  Upstart was supposed to be an attempt to address these
issues in Redhat/CentOS 6, but the hybrid startup process that has
resulted from a partial transition to upstart is both confusing and
sometimes makes the problem worse.  I suspect the timing issues are
related also to the speed and number of processors on your system.

I've solved these problems in several different ways:

For CentOS 5, if you don't mind changing the number on the init script,
you can cause it to start later in the startup process.  Sometimes this
isn't enough.  In some cases I've solved the problem by creating my own
init script which has a sleep command in it and then either starts or
restarts the selected component after a fixed time delay. Note that the
init script must fire up a shell that runs in the background and then
runs the restart command after the specified time.  Maybe not so
elegant, but it works.

In CentOS 6 you can just create an upstart job with the correct