> On 4/14/21 2:22 AM, Simon Matter wrote: >>>>> On 4/13/21 11:36 PM, Chris Schanzle via CentOS wrote: >>>>>> On 4/13/21 5:00 PM, Frank Cox wrote: >>>>>>> On Tue, 13 Apr 2021 22:29:26 +0200 >>>>>>> Simon Matter wrote: >>>>>>> >>>>>>>> You could try running strace on the hanging process so see what >>>>>>>> it's >>>>>>>> doing. >>>>>>> [frankcox at mutt temp]$ rsync -avv ../temp/ jeff:temp >>>>>>> opening connection using: ssh jeff rsync --server >>>>>>> -vvlogDtpre.iLsfxC >>>>>>> . >>>>> temp (7 args) >>>>>>> sending incremental file list >>>>>>> delta-transmission enabled >>>>>>> abc is uptodate >>>>>>> total: matches=0 hash_hits=0 false_alarms=0 data=0 >>>>>>> >>>>>>> Leaving that sit there apparently doing nothing (but still not >>>>>>> giving >>>>>>> me my cursor back) I switched to another terminal window and did >>>>>>> the >>>>>>> following: >>>>>>> >>>>>>> [frankcox at mutt ~]$ ps -FA | grep rsync >>>>>>> frankcox 5400 2435 0 60586 3160 5 14:52 pts/0 00:00:00 >>>>>>> rsync -avv ../temp/ jeff:temp >>>>>>> frankcox 5401 5400 0 67980 7440 1 14:52 pts/0 00:00:00 >>>>>>> ssh >>>>>> jeff rsync --server -vvlogDtpre.iLsfxC . temp >>>>>>> frankcox 5526 5416 0 55476 1076 3 14:53 pts/1 00:00:00 >>>>>>> grep --color=auto rsync >>>>>>> >>>>>>> [frankcox at mutt ~]$ strace -p 5401 >>>>>>> strace: Process 5401 attached >>>>>>> select(11, [5 9 10], [], NULL, NULL >>>>>>> >>>>>>> Then it just sits there with no further action. I get my cursor >>>>>>> back >>>>>>> when I hit ctrl-c. >>>>>>> >>>>>>> [frankcox at mutt ~]$ strace -p 5400 >>>>>>> strace: Process 5400 attached >>>>>>> restart_syscall(<... resuming interrupted nanosleep ...>) = 0 >>>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 >>>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 >>>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 >>>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 >>>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 >>>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 >>>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 >>>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 >>>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 >>>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 >>>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 >>>>>>> nanosleep({tv_sec=0, tv_nsec=20000000}, NULL) = 0 >>>>>>> wait4(5401, 0x7ffd45105564, WNOHANG, NULL) = 0 >>>>>>> >>>>>>> The wait4-etc line just keeps repeating endlessly until I hit >>>>>>> ctrl-c. >>>>>>> >>>>>>> Unfortunately, I have no idea what any of the above actually means. >>>>>>> Does it tell us anything interesting? >>>>>> Yay! I am glad someone else on the planet is experiencing this. >>>>>> I noticed this started happening to me after updating some CentOS >>>>>> Linux >>>>> 8 >>>>>> systems today. >>>>>> >>>>>> I discovered if I set ForwardX11=no (either on ssh command line or >>>>>> in >>>>> ~/.ssh/config) the hang does not happen. But why does that matter? >>>>> No >>>>> updates to openssh. >>>>>> It is not the systemd update doing something silly with session >>>>>> management. I painfully downgraded manually and rebooted to no >>>>>> effect. >>>>>> As an aside, why can't we we have nice things in life like 'dnf >>>>>> downgrade >>>>>> systemd\*' actually work? I did the below - might be dumb, but it >>>>> worked -- alternate suggestions to downgrade are appreciated - >>>>> searching >>>>> the list and my google-fu was off the mark today. >>>>>> cd [path-to-repo]/centos/8/BaseOS/x86_64/os/Packages >>>>>> dnf downgrade $(rpm -qa systemd\* | grep 239-41.el8_3.2 | sed -e >>>>> 's/3\.2/3.1/' -e 's/^/.\//' -e 's/$/.rpm/') >>>>>> Chris >>>>> >>>>> [adjusted the subject, hope that is OK.] >>>>> >>>>> Found it! It's the dbus update to 1.12.8-12. Downgrade to -11 >>>>> and ssh connections close normally. >>>>> >>>>> To clarify the problem, with the new dbus, simple ssh's like: >>>>> >>>>> ssh somehost uptime >>>>> >>>>> will print the uptime, but do not return to the local shell prompt >>>>> until >>>>> you hit ctrl-c. It works normally if you downgrade dbus or >>>>> >>>>> ssh -o forwardx11=no somehost uptime >>>>> >>>>> I'm sure a bug report exists somewhere, but that's something to dig >>>>> for >>>>> or >>>>> create tomorrow. >>>>> >>>>> To downgrade, packages were scattered in different locations, so I >>>>> copied >>>>> them to one directory and did >>>>> >>>>> dnf downgrade ./* >>>>> >>>>> The packages I needed to downgrade on a x86_64 system were: >>>>> >>>>> dbus-1.12.8-11.el8.x86_64.rpm >>>>> dbus-common-1.12.8-11.el8.noarch.rpm >>>>> dbus-daemon-1.12.8-11.el8.x86_64.rpm >>>>> dbus-devel-1.12.8-11.el8.x86_64.rpm >>>>> dbus-libs-1.12.8-11.el8.x86_64.rpm >>>>> dbus-tools-1.12.8-11.el8.x86_64.rpm >>>>> dbus-x11-1.12.8-11.el8.x86_64.rpm >>>> Now that's really interesting, I was wondering why I don't see this on >>>> OL8. The thing is that certain OL8 packages have an additional RPM >>>> revision added like .0.1. Just checked dbus and its changelog shows: >>>> >>>> * Tue Feb 16 2021 Kevin Lyons <kevin.x.lyons at oracle.com> >>>> -1.12.8-12.0.1 >>>> - bus: raise fd limits before dropping privs [Orabug: 31175643] >>>> - fix netlink poll: error 4 (Zhenzhong Duan) >>>> >>>> So OL is defnitly not 100% bug to bug compatible like the other clones >>>> :-) >>>> >>>> And it makes me a bit worried why O* fixed this on Feb 16 and the >>>> broken >>>> dbus packages are now (in April) installed on CentOS servers? >>> Sorry, maybe I'm wrong here and the OL8 addons are fixing other things? >>> Could someone who experiences the issue test with the OL8 dbus >>> packages? >>> >> Could it be BZ #1940067? >> >> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1940067&data=04%7C01%7Cchristopher.schanzle%40nist.gov%7C33c18e2f06884a73d85508d8ff0dc2c4%7C2ab5d82fd8fa4797a93e054655c61dec%7C1%7C0%7C637539781864707918%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jFSxkP%2BWTZgq7VGAZGHXIWak7N%2BmP8SeGLelTTRUHv8%3D&reserved=0 > > Bullseye, Simon! Many thanks. > > A reasonable one-liner fix / workaround is below. Also works when > requesting a terminal with 'ssh -Xt'. Adds a "tty -s || return" line > in the right spot to check if a tty exists and if not, bail out w/o > starting dbus-launch. Change "-i" to "-i.bak" to make a backup. > > sed -i '/SHLVL/atty -s || return' /etc/profile.d/ssh-x-forwarding.sh Hi Chris, IMHO we see a fundamental problem here if desktop toys like D-Bus can have such an impact on basic tools like rsync. It's even worse if D-Bus goes crazy and makes systemd become unmanageable. Not fun on big servers :-) Regards, Simon