On 4/14/21 2:22 AM, Simon Matter wrote:
On 4/13/21 11:36 PM, Chris Schanzle via CentOS wrote:
On 4/13/21 5:00 PM, Frank Cox wrote:
On Tue, 13 Apr 2021 22:29:26 +0200 Simon Matter wrote:
> You could try running strace on the hanging process so see what it's > doing. [frankcox@mutt temp]$ rsync -avv ../temp/ jeff:temp opening connection using: ssh jeff rsync --server -vvlogDtpre.iLsfxC .
temp (7 args)
sending incremental file list delta-transmission enabled abc is uptodate total: matches=0 hash_hits=0 false_alarms=0 data=0
Leaving that sit there apparently doing nothing (but still not giving me my cursor back) I switched to another terminal window and did the following:
[frankcox@mutt ~]$ ps -FA | grep rsync frankcox 5400 2435 0 60586 3160 5 14:52 pts/0 00:00:00 rsync -avv ../temp/ jeff:temp frankcox 5401 5400 0 67980 7440 1 14:52 pts/0 00:00:00 ssh
jeff rsync --server -vvlogDtpre.iLsfxC . temp
frankcox 5526 5416 0 55476 1076 3 14:53 pts/1 00:00:00 grep --color=auto rsync
[frankcox@mutt ~]$ strace -p 5401 strace: Process 5401 attached select(11, [5 9 10], [], NULL, NULL
Then it just sits there with no further action. I get my cursor back when I hit ctrl-c.
[frankcox@mutt ~]$ strace -p 5400 strace: Process 5400 attached restart_syscall(<... resuming interrupted nanosleep ...>) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
The wait4-etc line just keeps repeating endlessly until I hit ctrl-c.
Unfortunately, I have no idea what any of the above actually means. Does it tell us anything interesting?
Yay! I am glad someone else on the planet is experiencing this. I noticed this started happening to me after updating some CentOS Linux
8
systems today.
I discovered if I set ForwardX11=no (either on ssh command line or in
~/.ssh/config) the hang does not happen. But why does that matter? No updates to openssh.
It is not the systemd update doing something silly with session management. I painfully downgraded manually and rebooted to no effect. As an aside, why can't we we have nice things in life like 'dnf downgrade systemd*' actually work? I did the below - might be dumb, but it
worked -- alternate suggestions to downgrade are appreciated - searching the list and my google-fu was off the mark today.
cd [path-to-repo]/centos/8/BaseOS/x86_64/os/Packages dnf downgrade $(rpm -qa systemd* | grep 239-41.el8_3.2 | sed -e
's/3.2/3.1/' -e 's/^/.//' -e 's/$/.rpm/')
Chris
[adjusted the subject, hope that is OK.]
Found it! It's the dbus update to 1.12.8-12. Downgrade to -11 and ssh connections close normally.
To clarify the problem, with the new dbus, simple ssh's like:
ssh somehost uptime
will print the uptime, but do not return to the local shell prompt until you hit ctrl-c. It works normally if you downgrade dbus or
ssh -o forwardx11=no somehost uptime
I'm sure a bug report exists somewhere, but that's something to dig for or create tomorrow.
To downgrade, packages were scattered in different locations, so I copied them to one directory and did
dnf downgrade ./*
The packages I needed to downgrade on a x86_64 system were:
dbus-1.12.8-11.el8.x86_64.rpm dbus-common-1.12.8-11.el8.noarch.rpm dbus-daemon-1.12.8-11.el8.x86_64.rpm dbus-devel-1.12.8-11.el8.x86_64.rpm dbus-libs-1.12.8-11.el8.x86_64.rpm dbus-tools-1.12.8-11.el8.x86_64.rpm dbus-x11-1.12.8-11.el8.x86_64.rpm
Now that's really interesting, I was wondering why I don't see this on OL8. The thing is that certain OL8 packages have an additional RPM revision added like .0.1. Just checked dbus and its changelog shows:
- Tue Feb 16 2021 Kevin Lyons kevin.x.lyons@oracle.com -1.12.8-12.0.1
- bus: raise fd limits before dropping privs [Orabug: 31175643]
- fix netlink poll: error 4 (Zhenzhong Duan)
So OL is defnitly not 100% bug to bug compatible like the other clones :-)
And it makes me a bit worried why O* fixed this on Feb 16 and the broken dbus packages are now (in April) installed on CentOS servers?
Sorry, maybe I'm wrong here and the OL8 addons are fixing other things? Could someone who experiences the issue test with the OL8 dbus packages?
Could it be BZ #1940067?
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.r...
Bullseye, Simon! Many thanks.
A reasonable one-liner fix / workaround is below. Also works when requesting a terminal with 'ssh -Xt'. Adds a "tty -s || return" line in the right spot to check if a tty exists and if not, bail out w/o starting dbus-launch. Change "-i" to "-i.bak" to make a backup.
sed -i '/SHLVL/atty -s || return' /etc/profile.d/ssh-x-forwarding.sh