[CentOS] rsync over ssh stalls after completing the job

Wed Apr 14 03:36:48 UTC 2021
Chris Schanzle <chris.schanzle at nist.gov>

On 4/13/21 5:00 PM, Frank Cox wrote:
> On Tue, 13 Apr 2021 22:29:26 +0200
> Simon Matter wrote:
>
>> You could try running strace on the hanging process so see what it's doing.
> [frankcox at mutt temp]$ rsync -avv ../temp/ jeff:temp
> opening connection using: ssh jeff rsync --server -vvlogDtpre.iLsfxC . temp  (7 args)
> sending incremental file list
> delta-transmission enabled
> abc is uptodate
> total: matches=0  hash_hits=0  false_alarms=0 data=0
>
> Leaving that sit there apparently doing nothing (but still not giving me my cursor back) I switched to another terminal window and did the following:
>
> [frankcox at mutt ~]$ ps -FA | grep rsync
> frankcox    5400    2435  0 60586  3160   5 14:52 pts/0    00:00:00 rsync -avv ../temp/ jeff:temp
> frankcox    5401    5400  0 67980  7440   1 14:52 pts/0    00:00:00 ssh 
jeff rsync --server -vvlogDtpre.iLsfxC . temp
> frankcox    5526    5416  0 55476  1076   3 14:53 pts/1    00:00:00 grep --color=auto rsync
>
> [frankcox at mutt ~]$ strace -p 5401
> strace: Process 5401 attached
> select(11, [5 9 10], [], NULL, NULL
>
> Then it just sits there with no further action.  I get my cursor back when I hit ctrl-c.
>
> [frankcox at mutt ~]$ strace -p 5400
> strace: Process 5400 attached
> restart_syscall(<... resuming interrupted nanosleep ...>) = 0
> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0
> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0
>
> The wait4-etc line just keeps repeating endlessly until I hit ctrl-c.
>
> Unfortunately, I have no idea what any of the above actually means.  Does it tell us anything interesting?


Yay!  I am glad someone else on the planet is experiencing this.  
I noticed this started happening to me after updating some CentOS Linux 8 
systems today.

I discovered if I set ForwardX11=no (either on ssh command line or in ~/.ssh/config) the hang does not happen.  But why does that matter?  No updates to openssh.

It is not the systemd update doing something silly with session management.  I painfully downgraded manually and rebooted to no effect.  
As an aside, why can't we we have nice things in life like 'dnf downgrade 
systemd\*' actually work?  I did the below - might be dumb, but it worked -- alternate suggestions to downgrade are appreciated - searching the list and my google-fu was off the mark today.

  cd [path-to-repo]/centos/8/BaseOS/x86_64/os/Packages
  dnf downgrade $(rpm -qa systemd\* | grep 239-41.el8_3.2 | sed -e 's/3\.2/3.1/' -e 's/^/.\//' -e 's/$/.rpm/')

Chris