nfslock

List overview All Threads
Download

newer

older

SMTP Port 465 - Postfix

About Postfix mail server using...

m.roth＠5-cent.us

21 Mar 2012 21 Mar '12

8:40 p.m.

I just updated one of our servers to 5.8, and rebooted. In the logs, I saw a bunch of Mar 21 16:29:02 <server> rpc.statd[9783]: recv_rply: can't decode RPC message! Mar 21 16:29:33 <server> last message repeated 442 times Mar 21 16:30:34 <server> last message repeated 835 times Mar 21 16:31:36 <server> last message repeated 884 times Mar 21 16:32:38 <server> last message repeated 856 times Mar 21 16:32:44 <server> last message repeated 111 times

I tried restarting nfslock, and that *appears* to have fixed it. Googling, I found a thread about that at http://nerdbynature.de/s9y/archives/2009/08.html, which suggests that it's starting too early, possibly before portmap is running.

Anyone else see this? Has an old bug snuck back in?

mark

Show replies by date

Adam Wead

21 Mar 21 Mar

11:50 p.m.

Mark,

There's a NFS bug with the latest kernel:

https://bugzilla.redhat.com/show_bug.cgi?id=798809

Reboot into your previous kernel and that should fix it.

...adam

On Wed, Mar 21, 2012 at 4:40 PM, m.roth@5-cent.us wrote:

...

I just updated one of our servers to 5.8, and rebooted. In the logs, I saw a bunch of Mar 21 16:29:02 <server> rpc.statd[9783]: recv_rply: can't decode RPC message! Mar 21 16:29:33 <server> last message repeated 442 times Mar 21 16:30:34 <server> last message repeated 835 times Mar 21 16:31:36 <server> last message repeated 884 times Mar 21 16:32:38 <server> last message repeated 856 times Mar 21 16:32:44 <server> last message repeated 111 times

I tried restarting nfslock, and that *appears* to have fixed it. Googling, I found a thread about that at http://nerdbynature.de/s9y/archives/2009/08.html, which suggests that it's starting too early, possibly before portmap is running.

Anyone else see this? Has an old bug snuck back in?

mark

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

mark

22 Mar 22 Mar

12:08 p.m.

Adam,

Please don't top post. Reformatted....

On 03/21/12 19:50, Adam Wead wrote:

...

On Wed, Mar 21, 2012 at 4:40 PM,m.roth@5-cent.us wrote:

...
I just updated one of our servers to 5.8, and rebooted. In the logs, I saw a bunch of Mar 21 16:29:02<server> rpc.statd[9783]: recv_rply: can't decode RPC message! Mar 21 16:29:33<server> last message repeated 442 times Mar 21 16:30:34<server> last message repeated 835 times Mar 21 16:31:36<server> last message repeated 884 times Mar 21 16:32:38<server> last message repeated 856 times Mar 21 16:32:44<server> last message repeated 111 times

I tried restarting nfslock, and that *appears* to have fixed it. Googling, I found a thread about that at http://nerdbynature.de/s9y/archives/2009/08.html, which suggests that it's starting too early, possibly before portmap is running.

Anyone else see this? Has an old bug snuck back in?

There's a NFS bug with the latest kernel:

https://bugzilla.redhat.com/show_bug.cgi?id=798809

Reboot into your previous kernel and that should fix it.

Great - but I've just updated a server I've missed, that's been "we're too busy to let you do it" until now, and it would take it back to 5.7, at least. I suppose I can yum downgrade....

mark

-- The new existentialist cereal: Raisin D'Etre - Prairie Home Companion joke show

m.roth＠5-cent.us

3:24 p.m.

mark wrote:

...

On 03/21/12 19:50, Adam Wead wrote:

...
On Wed, Mar 21, 2012 at 4:40 PM,m.roth@5-cent.us wrote:

...
I just updated one of our servers to 5.8, and rebooted. In the logs, I saw a bunch of Mar 21 16:29:02<server> rpc.statd[9783]: recv_rply: can't decode RPC message! Mar 21 16:29:33<server> last message repeated 442 times Mar 21 16:30:34<server> last message repeated 835 times Mar 21 16:31:36<server> last message repeated 884 times Mar 21 16:32:38<server> last message repeated 856 times Mar 21 16:32:44<server> last message repeated 111 times

I tried restarting nfslock, and that *appears* to have fixed it. Googling, I found a thread about that at http://nerdbynature.de/s9y/archives/2009/08.html, which suggests that it's starting too early, possibly before portmap is running.

Anyone else see this? Has an old bug snuck back in?

There's a NFS bug with the latest kernel:

https://bugzilla.redhat.com/show_bug.cgi?id=798809

Reboot into your previous kernel and that should fix it.

Great - but I've just updated a server I've missed, that's been "we're too busy to let you do it" until now, and it would take it back to 5.7, at least. I suppose I can yum downgrade....

Following myself up - I didn't look at the bugzilla link earlier - updated t-bird at home the other day, and the click link to open it in browser doesn't work - but looked at it here, and it doesn't seem to be related - this is a backup server, and only had a home directory mounted when I ssh'd in. It does appear to have been the case suggested in the thread I've mentioned - there's no entry in the logfile after I restarted nfslock.

mark

Nataraj

5:12 p.m.

On 03/22/2012 08:24 AM, m.roth@5-cent.us wrote:

...

mark wrote:

...
On 03/21/12 19:50, Adam Wead wrote:

...
On Wed, Mar 21, 2012 at 4:40 PM,m.roth@5-cent.us wrote:

...
I just updated one of our servers to 5.8, and rebooted. In the logs, I saw a bunch of Mar 21 16:29:02<server> rpc.statd[9783]: recv_rply: can't decode RPC message! Mar 21 16:29:33<server> last message repeated 442 times Mar 21 16:30:34<server> last message repeated 835 times Mar 21 16:31:36<server> last message repeated 884 times Mar 21 16:32:38<server> last message repeated 856 times Mar 21 16:32:44<server> last message repeated 111 times

I tried restarting nfslock, and that *appears* to have fixed it. Googling, I found a thread about that at http://nerdbynature.de/s9y/archives/2009/08.html, which suggests that it's starting too early, possibly before portmap is running.

Anyone else see this? Has an old bug snuck back in?

There's a NFS bug with the latest kernel:

https://bugzilla.redhat.com/show_bug.cgi?id=798809

Reboot into your previous kernel and that should fix it.

Great - but I've just updated a server I've missed, that's been "we're too busy to let you do it" until now, and it would take it back to 5.7, at least. I suppose I can yum downgrade....

Following myself up - I didn't look at the bugzilla link earlier - updated t-bird at home the other day, and the click link to open it in browser doesn't work - but looked at it here, and it doesn't seem to be related - this is a backup server, and only had a home directory mounted when I ssh'd in. It does appear to have been the case suggested in the thread I've mentioned - there's no entry in the logfile after I restarted nfslock.
     mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

I run into these startup timing issues all the time on many linux distributions. Upstart was supposed to be an attempt to address these issues in Redhat/CentOS 6, but the hybrid startup process that has resulted from a partial transition to upstart is both confusing and sometimes makes the problem worse. I suspect the timing issues are related also to the speed and number of processors on your system.

I've solved these problems in several different ways:

For CentOS 5, if you don't mind changing the number on the init script, you can cause it to start later in the startup process. Sometimes this isn't enough. In some cases I've solved the problem by creating my own init script which has a sleep command in it and then either starts or restarts the selected component after a fixed time delay. Note that the init script must fire up a shell that runs in the background and then runs the restart command after the specified time. Maybe not so elegant, but it works.

In CentOS 6 you can just create an upstart job with the correct dependencies.

Nataraj

m.roth＠5-cent.us

6:50 p.m.

Nataraj wrote:

...

On 03/22/2012 08:24 AM, m.roth@5-cent.us wrote:

...
mark wrote:

...
On 03/21/12 19:50, Adam Wead wrote:

...
On Wed, Mar 21, 2012 at 4:40 PM,m.roth@5-cent.us wrote:

...
I just updated one of our servers to 5.8, and rebooted. In the logs, I saw a bunch of Mar 21 16:29:02<server> rpc.statd[9783]: recv_rply: can't decode RPC message! Mar 21 16:29:33<server> last message repeated 442 times

<snip>

...

...
...
...
...
I tried restarting nfslock, and that *appears* to have fixed it. Googling, I found a thread about that at http://nerdbynature.de/s9y/archives/2009/08.html, which suggests that it's starting too early, possibly before portmap is running.

Anyone else see this? Has an old bug snuck back in?

<snip> -

...

...
this is a backup server, and only had a home directory mounted when I ssh'd in. It does appear to have been the case suggested in the thread I've mentioned - there's no entry in the logfile after I restarted nfslock.

I run into these startup timing issues all the time on many linux distributions. Upstart was supposed to be an attempt to address these issues in Redhat/CentOS 6, but the hybrid startup process that has resulted from a partial transition to upstart is both confusing and sometimes makes the problem worse. I suspect the timing issues are related also to the speed and number of processors on your system.

I've solved these problems in several different ways:

For CentOS 5, if you don't mind changing the number on the init script, you can cause it to start later in the startup process. Sometimes this isn't enough. In some cases I've solved the problem by creating my own init script which has a sleep command in it and then either starts or restarts the selected component after a fixed time delay. Note that the init script must fire up a shell that runs in the background and then runs the restart command after the specified time. Maybe not so elegant, but it works.

In this case, a more elegant solution would be one that the authors of the initscript should have thought of: they're already checking to see if something's running, why not loop with a sleep until portmap's running?

mark

John R Pierce

6:54 p.m.

On 03/22/12 11:50 AM, m.roth@5-cent.us wrote:

...

In this case, a more elegant solution would be one that the authors of the initscript should have thought of: they're already checking to see if something's running, why not loop with a sleep until portmap's running?

they'd have to spawn a detached shell for that, as the rc scripts won't continue until the current script returns.

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

Nataraj

8:08 p.m.

On 03/22/2012 11:54 AM, John R Pierce wrote:

...

On 03/22/12 11:50 AM, m.roth@5-cent.us wrote:

...
In this case, a more elegant solution would be one that the authors of the initscript should have thought of: they're already checking to see if something's running, why not loop with a sleep until portmap's running?

they'd have to spawn a detached shell for that, as the rc scripts won't continue until the current script returns.

You have to spawn a detached shell anyway weather you do a sleep or check to see if portmap is running. If you want to check to see if it's running, that will certainly work too. In my case, I used a time delay because the problem I was having was with named not binding to the vmnet interfaces because vmware took too long to start. named needed to start early on because other daemons were depended on it, but then it needed to be kicked later so it would bind to the newly created vmnet interface.

Upstart (which was authored by one of the Ubuntu developers) is now part of CentOS 6. It attempts to address these issues by allowing you to define dependencies between upstart scripts. Unfortunately it's still a mess in CentOS 6 because a large number of packages still use the old init scripts. Furthermore RedHat has decided that they don't like Upstart and they are going to yet another replacement for upstart in future releases (sorry, I don't remember the name of it).

Nataraj

Kanwar Ranbir Sandhu

26 Mar 26 Mar

2:18 a.m.

On Thu, 2012-03-22 at 13:08 -0700, Nataraj wrote:

...

Furthermore RedHat has decided that they don't like Upstart and they are going to yet another replacement for upstart in future releases (sorry, I don't remember the name of it).

You're thinking about systemd.

I believe Fedora 15 was the first Fedora release with systemd.

Regards,

Ranbir

Lars Hecking

4:01 p.m.

Kanwar Ranbir Sandhu writes:

...

On Thu, 2012-03-22 at 13:08 -0700, Nataraj wrote:

...
Furthermore RedHat has decided that they don't like Upstart and they are going to yet another replacement for upstart in future releases (sorry, I don't remember the name of it).

They should also realise that they don't like NetworkManager and get rid of it.

John R Pierce

4:56 p.m.

On 03/26/12 9:01 AM, Lars Hecking wrote:

...

They should also realise that they don't like NetworkManager and get rid of it.

and replace it with what?

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

Lars Hecking

5 p.m.

John R Pierce writes:

...

On 03/26/12 9:01 AM, Lars Hecking wrote:

...
They should also realise that they don't like NetworkManager and get rid of it.

and replace it with what?

No replacement needed. Or at least go back to the pre-6 situation and not stuff it down our throats as a mandatory requirement.

John R Pierce

5:18 p.m.

On 03/26/12 10:00 AM, Lars Hecking wrote:

...

No replacement needed. Or at least go back to the pre-6 situation and not stuff it down our throats as a mandatory requirement.

wireless sure needs it to work decently, without it, its a kludge of a kludge.

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

m.roth＠5-cent.us

6:10 p.m.

John R Pierce wrote:

...

On 03/26/12 10:00 AM, Lars Hecking wrote:

...
No replacement needed. Or at least go back to the pre-6 situation and not stuff it down our throats as a mandatory requirement.

wireless sure needs it to work decently, without it, its a kludge of a kludge.

That's fine... but I just found wpa-supplicant running on one of my *servers*; I chkconfig'd it off, and service stopped it, and then found Network(mis)Manager had apparently restarted it. I shut *that* off, and I could finally kill the idiot thing.

Back to service network start, thank you very much.

mark

Lars Hecking

27 Mar 27 Mar

9:22 a.m.

...

wireless sure needs it to work decently, without it, its a kludge of a kludge.

Sure, it's an excellent choice for mobile devices.

But making it the default on an *Enterprise* distribution makes little sense.

(Just checking, this is still the CentOS mailing list, not Ubuntu? Yes.)

Johnny Hughes

9:33 a.m.

On 03/27/2012 04:22 AM, Lars Hecking wrote:

...

...
wireless sure needs it to work decently, without it, its a kludge of a kludge.

Sure, it's an excellent choice for mobile devices.

But making it the default on an *Enterprise* distribution makes little sense.

(Just checking, this is still the CentOS mailing list, not Ubuntu? Yes.)

Remember the they here is NOT CentOS ... if I had my choice then Network Manager would not install by default on my server at all.

However, it is Wireless and not Mobile that really need Network Manager.

And there are MANY non-Mobile wireless devices now that are being installed "in the Enterprise". (Workstations, phone systems, building security systems, PKI card readers for access, etc.).

So, really, it is mostly servers where you know you have a hard wired connection now.

I don't think that Network Manager should be used outside of gnome (or KDE) personally, but upstream makes those kind of decisions ... we just clone the experience as closely as possible.

John R Pierce

5:44 p.m.

On 03/27/12 2:33 AM, Johnny Hughes wrote:

...

I don't think that Network Manager should be used outside of gnome (or KDE) personally, but upstream makes those kind of decisions ... we just clone the experience as closely as possible.

I think it just needs a little more refinement and better documentation on how to deal with it outside of the gnome environment. IMHO, with decent documentation, it makes sense to use it for DHCP (the 'conventional' DHCP implementation was, IMHO, kludgy, it was quite automagic as to what was going on),

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

Lamar Owen

31 Mar 31 Mar

8:12 p.m.

On Tuesday, March 27, 2012 05:22:53 AM Lars Hecking wrote:

...

But making it the default on an *Enterprise* distribution makes little sense.

*Enterprise* != *server*

Phil Schaffner

27 Mar 27 Mar

12:35 a.m.

Lars Hecking wrote on 03/26/2012 01:00 PM:

...

No replacement needed. Or at least go back to the pre-6 situation and not stuff it down our throats as a mandatory requirement.

Just because NetworkManager is the default does not mean it is mandatory. You are free to "yum remove NetworkManager" and use the network service. It is, at least, much improved from the EL5 version, and virtually essential for mobile systems.

Phil

5105

Age (days ago)

5115

Last active (days ago)

discuss@lists.centos.org

18 comments

10 participants

tags (0)

participants (10)

Adam Wead
John R Pierce
Johnny Hughes
Kanwar Ranbir Sandhu
Lamar Owen
Lars Hecking
m.roth＠5-cent.us
mark
Nataraj
Phil Schaffner