I am running CentOS 7 on a workstation and CentOS 6 and 7 on a couple of servers I am remotely connecting to using the same username and thus ssh configuration. However, one of the servers running CentOS 6 I keep getting disconnected from whereas I have no such problems with another CentOS 6 server and CentOS 7 server. The latter two are on two different hosted server setups far, far away whereas the problematic one is my own physical hardware in the same building.
I have tried to make sure the sshd configuration on all servers are identical but still have this problem. I can rule out a general problem with the router in my office since all connections are via that router, the only difference is that the problematic server is in the same building and the connection loops back via the same router but through an external IP address. /var/log/secure on the workstation offers no clues with no messages regarding any disconnection.
If anyone has suggestions what I should check, it would be greatly appreciated!
On 12/25/19 6:56 AM, H wrote:
I have tried to make sure the sshd configuration on all servers are identical but still have this problem. I can rule out a general problem with the router in my office since all connections are via that router, the only difference is that the problematic server is in the same building and the connection loops back via the same router but through an external IP address.
When you say "external address," I assume you mean that your office network is being NATed. In that case, when you are connecting to systems outside your network, the router is performing SNAT for your connections. When you connect to the system in your building, using an "external" address, your router is probably performing both SNAT and DNAT for that connection. Your router may have different timeouts on its SNAT and DNAT tables. More than likely, the timeout for DNAT is lower than the TCP keepalive time, and you're seeing idle connections closed by the router. You might be able to prevent that by setting a ServerAliveInterval value in ~/.ssh/config. It is disabled by default, but should keep connections alive in your case, if it is set lower than the timeout on the router.
On 12/26/2019 02:47 PM, Gordon Messmer wrote:
On 12/25/19 6:56 AM, H wrote:
I have tried to make sure the sshd configuration on all servers are identical but still have this problem. I can rule out a general problem with the router in my office since all connections are via that router, the only difference is that the problematic server is in the same building and the connection loops back via the same router but through an external IP address.
When you say "external address," I assume you mean that your office network is being NATed. In that case, when you are connecting to systems outside your network, the router is performing SNAT for your connections. When you connect to the system in your building, using an "external" address, your router is probably performing both SNAT and DNAT for that connection. Your router may have different timeouts on its SNAT and DNAT tables. More than likely, the timeout for DNAT is lower than the TCP keepalive time, and you're seeing idle connections closed by the router. You might be able to prevent that by setting a ServerAliveInterval value in ~/.ssh/config. It is disabled by default, but should keep connections alive in your case, if it is set lower than the timeout on the router.
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
I now have additional information:
- I could not find any entries in /var/log/secure on the server either that related to disconnections.
- I am also being disconnected while doing ssh transfers, thus not only when the session is idle. No set time intervals but often as quickly as within 10 minutes after establishing the ssh connection.
- Further - which I forgot to mention - when I connect from my workstation back to server on the same router using Cisco AnyConnect software terminating far, far away and then thus going back to same router, I have no problems with being disconnected when the connection is idle. Thus no general problem with the router or the hardware on the server itself.
- Finally, today I for the first time connected to the server using the internal 192.168.x.x. address and have after several hours of idle session not been disconnected.
Are my observations above still consistent with your hypothesis?
On 12/26/2019 04:45 PM, Gordon Messmer wrote:
On 12/26/19 12:59 PM, H wrote:
Are my observations above still consistent with your hypothesis?
Largely, yes. I'm not sure why you'd be disconnected while transferring data (one of scp or sftp, right?), but it sounds like a DNAT-related limit.
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Yes, I am using scp and sometimes, but not always, get disconnected within 10 minutes while transferring gigabyte files. I will try what you suggested in your previous post. I guess there is nothing I can change in the router itself?
On 12/26/2019 05:05 PM, H wrote:
On 12/26/2019 04:45 PM, Gordon Messmer wrote:
On 12/26/19 12:59 PM, H wrote:
Are my observations above still consistent with your hypothesis?
Largely, yes. I'm not sure why you'd be disconnected while transferring data (one of scp or sftp, right?), but it sounds like a DNAT-related limit.
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Yes, I am using scp and sometimes, but not always, get disconnected within 10 minutes while transferring gigabyte files. I will try what you suggested in your previous post. I guess there is nothing I can change in the router itself?
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
I just looked at the settings in /etc/ssh/ssh_config on the workstation - which should apply to all users on it - I already had:
Host *
TCPKeepAlive yes
ServerAliveInterval 60
ServerAliveCountMax 300
So, unless there are some other settings I should check, this should always apply when I open any ssh connection on the workstation, or?
On 12/26/19 2:49 PM, H wrote:
I just looked at the settings in /etc/ssh/ssh_config on the workstation - which should apply to all users on it - I already had:
Host * TCPKeepAlive yes ServerAliveInterval 60
Well, keep-alive options would only make a difference if the problem were a DNAT timeout. If it's some other limitation imposed on DNAT, those won't have any effect.
If you can reproduce this reliably and have admin access to both the server and client, you can determine whether the router is the problem:
1) Start an scp transfer of a large file
2) Use netstat or ss on the client to determine what port the client is using for the SSH connection
3) Use netstat or ss on the server to determine what port the client is using (NAT will probably change both the client's address and port)
4) Run "tcpdump -nn host <server address> and port <client TCP port>" on the client, using the values from step 2
5) Run "tcpdump -nn host <client address> and port <client TCP port>" on the server, using the values from step 3
6) Wait for the transfer to terminate
I expect that when the client terminates, you'll see a TCP reset packet at the end of the output from tcpdump on the client side, but you won't see that packet in the tcpdump output on the server side. If so, then the router is sending the TCP reset, and you'll need to work with its owners to resolve the problem.
Incidentally, why are you connecting to an internal resource through an external address (NAT)? Are you unable to connect directly to its internal address?
On 12/26/2019 08:13 PM, Gordon Messmer wrote:
On 12/26/19 2:49 PM, H wrote:
I just looked at the settings in /etc/ssh/ssh_config on the workstation - which should apply to all users on it - I already had:
Host * TCPKeepAlive yes ServerAliveInterval 60
Well, keep-alive options would only make a difference if the problem were a DNAT timeout. If it's some other limitation imposed on DNAT, those won't have any effect.
If you can reproduce this reliably and have admin access to both the server and client, you can determine whether the router is the problem:
Start an scp transfer of a large file
Use netstat or ss on the client to determine what port the client is using for the SSH connection
Use netstat or ss on the server to determine what port the client is using (NAT will probably change both the client's address and port)
Run "tcpdump -nn host <server address> and port <client TCP port>" on the client, using the values from step 2
Run "tcpdump -nn host <client address> and port <client TCP port>" on the server, using the values from step 3
Wait for the transfer to terminate
I expect that when the client terminates, you'll see a TCP reset packet at the end of the output from tcpdump on the client side, but you won't see that packet in the tcpdump output on the server side. If so, then the router is sending the TCP reset, and you'll need to work with its owners to resolve the problem.
Incidentally, why are you connecting to an internal resource through an external address (NAT)? Are you unable to connect directly to its internal address?
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Thank you very much, very nice summary! The only thing I needed to add was the specific ethernet port in tcpdump, eg tcdump -i ....
However, after you posted the above, I have not had this problem... Might come back though.
As for the reason I am using the external address when the internal address suffices is that I access the same server externally as well. For simplicity I used the external address in both scenarios.
are you using ssh to connect to a server, and from there do a scp? If so, your ssh session might be the one timing out,and not the
scp session, due to inactivity on the terminal session. you can always use the -vvv option, to see more detailed msgs about what is going on.
Ron
On 12/25/19 7:56 AM, H wrote:
I am running CentOS 7 on a workstation and CentOS 6 and 7 on a couple of servers I am remotely connecting to using the same username and thus ssh configuration. However, one of the servers running CentOS 6 I keep getting disconnected from whereas I have no such problems with another CentOS 6 server and CentOS 7 server. The latter two are on two different hosted server setups far, far away whereas the problematic one is my own physical hardware in the same building.
I have tried to make sure the sshd configuration on all servers are identical but still have this problem. I can rule out a general problem with the router in my office since all connections are via that router, the only difference is that the problematic server is in the same building and the connection loops back via the same router but through an external IP address. /var/log/secure on the workstation offers no clues with no messages regarding any disconnection.
If anyone has suggestions what I should check, it would be greatly appreciated!
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos