Re: [CentOS] ssh stalls/hangs instead of exiting

14 Apr 2021

      On 4/14/21 2:22 AM, Simon Matter wrote:
...
...
...
...
On 4/13/21 11:36 PM, Chris Schanzle via CentOS wrote:
...
On 4/13/21 5:00 PM, Frank Cox wrote:
...
On Tue, 13 Apr 2021 22:29:26 +0200
Simon Matter wrote:
> You could try running strace on the hanging process so see what it's
> doing.
[frankcox@mutt temp]$ rsync -avv ../temp/ jeff:temp
opening connection using: ssh jeff rsync --server -vvlogDtpre.iLsfxC
.
temp  (7 args)
...
...
sending incremental file list
delta-transmission enabled
abc is uptodate
total: matches=0  hash_hits=0  false_alarms=0 data=0
Leaving that sit there apparently doing nothing (but still not giving
me my cursor back) I switched to another terminal window and did the
following:
[frankcox@mutt ~]$ ps -FA | grep rsync
frankcox    5400    2435  0 60586  3160   5 14:52 pts/0    00:00:00
rsync -avv ../temp/ jeff:temp
frankcox    5401    5400  0 67980  7440   1 14:52 pts/0    00:00:00
ssh
jeff rsync --server -vvlogDtpre.iLsfxC . temp
...
frankcox    5526    5416  0 55476  1076   3 14:53 pts/1    00:00:00
grep --color=auto rsync
[frankcox@mutt ~]$ strace -p 5401
strace: Process 5401 attached
select(11, [5 9 10], [], NULL, NULL
Then it just sits there with no further action.  I get my cursor back
when I hit ctrl-c.
[frankcox@mutt ~]$ strace -p 5400
strace: Process 5400 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
The wait4-etc line just keeps repeating endlessly until I hit ctrl-c.
Unfortunately, I have no idea what any of the above actually means.
Does it tell us anything interesting?
Yay!  I am glad someone else on the planet is experiencing this. 
I noticed this started happening to me after updating some CentOS
Linux
8
...
systems today.
I discovered if I set ForwardX11=no (either on ssh command line or in
~/.ssh/config) the hang does not happen.  But why does that matter?  No
updates to openssh.
...
It is not the systemd update doing something silly with session
management.  I painfully downgraded manually and rebooted to no
effect. 
As an aside, why can't we we have nice things in life like 'dnf
downgrade
systemd*' actually work?  I did the below - might be dumb, but it
worked -- alternate suggestions to downgrade are appreciated -
searching
the list and my google-fu was off the mark today.
...
cd [path-to-repo]/centos/8/BaseOS/x86_64/os/Packages
  dnf downgrade $(rpm -qa systemd* | grep 239-41.el8_3.2 | sed -e
's/3.2/3.1/' -e 's/^/.//' -e 's/$/.rpm/')
...
Chris
[adjusted the subject, hope that is OK.]
Found it!  It's the dbus update to 1.12.8-12.  Downgrade to -11
and ssh connections close normally.
To clarify the problem, with the new dbus, simple ssh's like:
ssh somehost uptime
will print the uptime, but do not return to the local shell prompt
until
you hit ctrl-c.  It works normally if you downgrade dbus or
ssh -o forwardx11=no somehost uptime
I'm sure a bug report exists somewhere, but that's something to dig for
or
create tomorrow.
To downgrade, packages were scattered in different locations, so I
copied
them to one directory and did
dnf downgrade ./*
The packages I needed to downgrade on a  x86_64 system were:
dbus-1.12.8-11.el8.x86_64.rpm
dbus-common-1.12.8-11.el8.noarch.rpm
dbus-daemon-1.12.8-11.el8.x86_64.rpm
dbus-devel-1.12.8-11.el8.x86_64.rpm
dbus-libs-1.12.8-11.el8.x86_64.rpm
dbus-tools-1.12.8-11.el8.x86_64.rpm
dbus-x11-1.12.8-11.el8.x86_64.rpm
Now that's really interesting, I was wondering why I don't see this on
OL8. The thing is that certain OL8 packages have an additional RPM
revision added like .0.1. Just checked dbus and its changelog shows:

Tue Feb 16 2021 Kevin Lyons kevin.x.lyons@oracle.com -1.12.8-12.0.1

bus: raise fd limits before dropping privs [Orabug: 31175643]
fix netlink poll: error 4 (Zhenzhong Duan)

So OL is defnitly not 100% bug to bug compatible like the other clones
:-)
And it makes me a bit worried why O* fixed this on Feb 16 and the broken
dbus packages are now (in April) installed on CentOS servers?
Sorry, maybe I'm wrong here and the OL8 addons are fixing other things?
Could someone who experiences the issue test with the OL8 dbus packages?
Could it be BZ #1940067?
https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.r...
Bullseye, Simon!  Many thanks.
A reasonable one-liner fix / workaround is below.  Also works when requesting a terminal with 'ssh -Xt'.  Adds a "tty -s || return" line 
in the right spot to check if a tty exists and if not, bail out w/o starting dbus-launch.  Change "-i" to "-i.bak" to make a backup.
sed -i '/SHLVL/atty -s || return' /etc/profile.d/ssh-x-forwarding.sh

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] ssh stalls/hangs instead of exiting