On 4/13/21 11:36 PM, Chris Schanzle via CentOS wrote:
On 4/13/21 5:00 PM, Frank Cox wrote:
On Tue, 13 Apr 2021 22:29:26 +0200 Simon Matter wrote:
You could try running strace on the hanging process so see what it's doing.
[frankcox@mutt temp]$ rsync -avv ../temp/ jeff:temp opening connection using: ssh jeff rsync --server -vvlogDtpre.iLsfxC .
temp (7 args)
sending incremental file list delta-transmission enabled abc is uptodate total: matches=0 hash_hits=0 false_alarms=0 data=0
Leaving that sit there apparently doing nothing (but still not giving me my cursor back) I switched to another terminal window and did the following:
[frankcox@mutt ~]$ ps -FA | grep rsync frankcox 5400 2435 0 60586 3160 5 14:52 pts/0 00:00:00 rsync -avv ../temp/ jeff:temp frankcox 5401 5400 0 67980 7440 1 14:52 pts/0 00:00:00 ssh
jeff rsync --server -vvlogDtpre.iLsfxC . temp
frankcox 5526 5416 0 55476 1076 3 14:53 pts/1 00:00:00 grep --color=auto rsync
[frankcox@mutt ~]$ strace -p 5401 strace: Process 5401 attached select(11, [5 9 10], [], NULL, NULL
Then it just sits there with no further action. I get my cursor back when I hit ctrl-c.
[frankcox@mutt ~]$ strace -p 5400 strace: Process 5400 attached restart_syscall(<... resuming interrupted nanosleep ...>) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
The wait4-etc line just keeps repeating endlessly until I hit ctrl-c.
Unfortunately, I have no idea what any of the above actually means. Does it tell us anything interesting?
Yay! I am glad someone else on the planet is experiencing this. I noticed this started happening to me after updating some CentOS Linux
8
systems today.
I discovered if I set ForwardX11=no (either on ssh command line or in
~/.ssh/config) the hang does not happen. But why does that matter? No updates to openssh.
It is not the systemd update doing something silly with session management. I painfully downgraded manually and rebooted to no effect.
As an aside, why can't we we have nice things in life like 'dnf downgrade systemd*' actually work? I did the below - might be dumb, but it
worked -- alternate suggestions to downgrade are appreciated - searching the list and my google-fu was off the mark today.
cd [path-to-repo]/centos/8/BaseOS/x86_64/os/Packages dnf downgrade $(rpm -qa systemd* | grep 239-41.el8_3.2 | sed -e
's/3.2/3.1/' -e 's/^/.//' -e 's/$/.rpm/')
Chris
[adjusted the subject, hope that is OK.]
Found it! It's the dbus update to 1.12.8-12. Downgrade to -11 and ssh connections close normally.
To clarify the problem, with the new dbus, simple ssh's like:
ssh somehost uptime
will print the uptime, but do not return to the local shell prompt until you hit ctrl-c. It works normally if you downgrade dbus or
ssh -o forwardx11=no somehost uptime
I'm sure a bug report exists somewhere, but that's something to dig for or create tomorrow.
To downgrade, packages were scattered in different locations, so I copied them to one directory and did
dnf downgrade ./*
The packages I needed to downgrade on a x86_64 system were:
dbus-1.12.8-11.el8.x86_64.rpm dbus-common-1.12.8-11.el8.noarch.rpm dbus-daemon-1.12.8-11.el8.x86_64.rpm dbus-devel-1.12.8-11.el8.x86_64.rpm dbus-libs-1.12.8-11.el8.x86_64.rpm dbus-tools-1.12.8-11.el8.x86_64.rpm dbus-x11-1.12.8-11.el8.x86_64.rpm
Now that's really interesting, I was wondering why I don't see this on OL8. The thing is that certain OL8 packages have an additional RPM revision added like .0.1. Just checked dbus and its changelog shows:
* Tue Feb 16 2021 Kevin Lyons kevin.x.lyons@oracle.com -1.12.8-12.0.1 - bus: raise fd limits before dropping privs [Orabug: 31175643] - fix netlink poll: error 4 (Zhenzhong Duan)
So OL is defnitly not 100% bug to bug compatible like the other clones :-)
And it makes me a bit worried why O* fixed this on Feb 16 and the broken dbus packages are now (in April) installed on CentOS servers?
Regards, Simon