problem on exceptional quit

List overview All Threads
Download

newer

older

Annoying license at install

Dnsmasq

Hua Wang

8 Oct 2015 8 Oct '15

2:33 a.m.

Dear Centos Users:

I installed Centos 7 on my server a few months ago. While using ssh, there is always a strange message "Write failed: Broken pipe”. It forces quit of SSH. It’s really annoying as it happens very often with irregular time interval - from a couple of minutes to a few hours. I have been working using Linux (Red Hat, Fedora and Centos) over 15 years. This didn’t happen for me even under centos 6.6. I have tried the following approaches, but none of them can help. I wonder if it can be solved by reinstall the system again. But it’s time consuming to reinstall a lot of softwares.

1. Login via Mac, Windows, Linux systems from different computers. 2. Modify sshd_config on the server as suggested by many posts: TCPKeepAlive yes ClientAliveInterval 60 3. Modify ~/.ssh/config file on my local computer: Host * ServerAliveInterval 60 4. Login ssh using -Y instead of -X. 5. add ‘unset autologout’ in my .cshrc. 6. I checked IP address with the internet administrator, and it works well. 7. add a file named autologout.csh with ‘set autologout=0’.

Do you know a good solution? Thanks!

Cheers,

Hua

----------------------------- Hua Wang, Ph.D. in Geodesy Department of Surveying Engineering, Guangdong University of Technology, 100 Waihuan Xi Rd., Panyu District, Guangzhou, 510006, China. Tel: +86-13570019257 Email: ehwang@163.com Homepage: http://homepages.see.leeds.ac.uk/~earhw

Show replies by date

Frank Cox

8 Oct 8 Oct

3:11 a.m.

On Thu, 8 Oct 2015 10:33:55 +0800 Hua Wang wrote:

...

While using ssh, there is always a strange message "Write failed: Broken pipe”. It forces quit of SSH.

It sounds like the network connection between you and the server is dying for some reason.

That being the case you probably can't fix it yourself if it's a remote server.

You may need to get a better Internet connection on one or both ends.

-- MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com

Hua Wang

3:18 a.m.

Hi Frank,

Thanks for your prompt reply. The server is in my office. Because I tried a few computers, so it shouldn’t be a problem of Internet connection of the clients. I tried to ping the server, and it can accept all data. Is there a good way to check it?

It always worked well for centos 6.6 using the same server and the same internet connections (IP, cable etc). The problem came out while reinstalling centos 7.7. I suspect it’s still a problem of system instead of network.

Cheers,

Hua

...

On Oct 8, 2015, at 11:11 AM, Frank Cox theatre@melvilletheatre.com wrote:

On Thu, 8 Oct 2015 10:33:55 +0800 Hua Wang wrote:

...
While using ssh, there is always a strange message "Write failed: Broken pipe”. It forces quit of SSH.

It sounds like the network connection between you and the server is dying for some reason.

That being the case you probably can't fix it yourself if it's a remote server.

You may need to get a better Internet connection on one or both ends.

-- MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Frank Cox

3:42 a.m.

On Thu, 8 Oct 2015 11:18:17 +0800 Hua Wang wrote:

...

I tried to ping the server, and it can accept all data. Is there a good way to check it?

ssh -v, ssh -vv and ssh -vvv might give you some interesting information.

...

The problem came out while reinstalling centos 7.7.

Since you're apparently using some kind of an unofficial or non-standard version of Centos, you might want to try using a current (regular) one instead.

-- MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com

Hua Wang

3:45 a.m.

...

...
I tried to ping the server, and it can accept all data. Is there a good way to check it?

ssh -v, ssh -vv and ssh -vvv might give you some interesting information.

Yes, I tried ssh -vvv. It gave a lot of information while login, but it quit without any further information except for “write failed, broken pipe’.

...

...
The problem came out while reinstalling centos 7.7.

Since you're apparently using some kind of an unofficial or non-standard version of Centos, you might want to try using a current (regular) one instead.

Sorry I made a mistake for the version. I am using v7 instead of v7.7.

Thanks,

Hua

Johnny Hughes

11:37 a.m.

On 10/07/2015 10:45 PM, Hua Wang wrote:

...

...
...
I tried to ping the server, and it can accept all data. Is there a good way to check it?

ssh -v, ssh -vv and ssh -vvv might give you some interesting information.

Yes, I tried ssh -vvv. It gave a lot of information while login, but it quit without any further information except for “write failed, broken pipe’.

...
...
The problem came out while reinstalling centos 7.7.

Since you're apparently using some kind of an unofficial or non-standard version of Centos, you might want to try using a current (regular) one instead.

Sorry I made a mistake for the version. I am using v7 instead of v7.7.

Try using ClientAliveMaxCount and ServerAliveCountMax (you can set them to 5 or 8 instead of the default of 3 and also make the timeouts higher than 60.

make sure you are using 'protocol 2'.

Hua Wang

1:32 p.m.

Dear Johnny,

Yes, I have tried much larger numbers than 60 and 3 for the above two parameters respectively. And I am sure it is using ‘protocol 2’ because it’s uncommented in sshd_config.

Is there a way to catch what’s happing before quit? I couldn’t see anything except for the line ‘write failed, broken pipe’.

Thanks.

Hua

At 2015-10-08 19:37:24, "Johnny Hughes" johnny@centos.org wrote:

...

On 10/07/2015 10:45 PM, Hua Wang wrote:

...
...
...
I tried to ping the server, and it can accept all data. Is there a good way to check it?

ssh -v, ssh -vv and ssh -vvv might give you some interesting information.

Yes, I tried ssh -vvv. It gave a lot of information while login, but it quit without any further information except for “write failed, broken pipe’.

...
...
The problem came out while reinstalling centos 7.7.

Since you're apparently using some kind of an unofficial or non-standard version of Centos, you might want to try using a current (regular) one instead.

Sorry I made a mistake for the version. I am using v7 instead of v7.7.

Try using ClientAliveMaxCount and ServerAliveCountMax (you can set them to 5 or 8 instead of the default of 3 and also make the timeouts higher than 60.

make sure you are using 'protocol 2'.

Jonathan Billings

1:55 p.m.

On Thu, Oct 08, 2015 at 09:32:41PM +0800, Hua Wang wrote:

...

Is there a way to catch what’s happing before quit? I couldn’t see anything except for the line ‘write failed, broken pipe’.

At this point, I'd suggest looking at the logs on the remote end and look to see what's being logged when the session closes.

-- Jonathan Billings billings@negate.org

Hua Wang

2:20 p.m.

Which logfile shall I have a look? Thanks,

Hua

At 2015-10-08 21:55:50, "Jonathan Billings" billings@negate.org wrote:

...

On Thu, Oct 08, 2015 at 09:32:41PM +0800, Hua Wang wrote:

...
Is there a way to catch what’s happing before quit? I couldn’t see anything except for the line ‘write failed, broken pipe’.

At this point, I'd suggest looking at the logs on the remote end and look to see what's being logged when the session closes.

-- Jonathan Billings billings@negate.org _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Jonathan Billings

2:22 p.m.

On Thu, Oct 08, 2015 at 10:20:17PM +0800, Hua Wang wrote:

...

Which logfile shall I have a look? Thanks,

That depends on the OS of the remote server. If it's CentOS 6, then I suggest checking /var/log/messages and /var/log/secure.

-- Jonathan Billings billings@negate.org

Leon Fauster

2:23 p.m.

Am 08.10.2015 um 15:32 schrieb Hua Wang ehwang@163.com:

...

Yes, I have tried much larger numbers than 60 and 3 for the above two parameters respectively. And I am sure it is using ‘protocol 2’ because it’s uncommented in sshd_config.

Is there a way to catch what’s happing before quit? I couldn’t see anything except for the line ‘write failed, broken pipe’.

as the system was physically "touch" (reinstalled 6->7), i suggest to check the hw again e.g. check cable, check switch port (change the port), power supply of the switch etc.

-- LF

Kahlil Hodgson

10:33 p.m.

Can you trigger the error reliably by doing something network intenstive, like scp or rsync a large file? I've seen similar behaviour with a bad NIC that was in the process of dying.

zep

11:43 a.m.

On 10/07/2015 11:45 PM, Hua Wang wrote:

...

...
...
I tried to ping the server, and it can accept all data. Is there a good way to check it?

ssh -v, ssh -vv and ssh -vvv might give you some interesting information.

Yes, I tried ssh -vvv. It gave a lot of information while login, but it quit without any further information except for “write failed, broken pipe’.

...
...
The problem came out while reinstalling centos 7.7.

Since you're apparently using some kind of an unofficial or non-standard version of Centos, you might want to try using a current (regular) one instead.

Sorry I made a mistake for the version. I am using v7 instead of v7.7.

Thanks,

Hua

I'm grasping at straws, admittedly, but does this happen after an extended amount of time? i.e. you make the connection (possibly to use a ssh tunnel running over the session), leave it for some time, then return to trying to use the tunnel and go back to see the connection error about the failure to write to write? are you sure the remote server isn't doing some sort of idle cleanup to kill off idle sessions?

-- public gpg key id: 1362BA1A

Hua Wang

1:32 p.m.

Dear Zep,

Thanks for your email. But it happened even when I was typing some command line. So it could be a problem of idle cleanup.

Hua

At 2015-10-08 19:43:05, "zep" zgreenfelder@gmail.com wrote:

...

On 10/07/2015 11:45 PM, Hua Wang wrote:

...
...
...
I tried to ping the server, and it can accept all data. Is there a good way to check it?

ssh -v, ssh -vv and ssh -vvv might give you some interesting information.

Yes, I tried ssh -vvv. It gave a lot of information while login, but it quit without any further information except for “write failed, broken pipe’.

...
...
The problem came out while reinstalling centos 7.7.

Since you're apparently using some kind of an unofficial or non-standard version of Centos, you might want to try using a current (regular) one instead.

Sorry I made a mistake for the version. I am using v7 instead of v7.7.

Thanks,

Hua

I'm grasping at straws, admittedly, but does this happen after an extended amount of time? i.e. you make the connection (possibly to use a ssh tunnel running over the session), leave it for some time, then return to trying to use the tunnel and go back to see the connection error about the failure to write to write? are you sure the remote server isn't doing some sort of idle cleanup to kill off idle sessions? -- public gpg key id: 1362BA1A

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Gordon Messmer

11:23 p.m.

On 10/07/2015 07:33 PM, Hua Wang wrote:

...

I installed Centos 7 on my server a few months ago. While using ssh, there is always a strange message "Write failed: Broken pipe”.

That's very often a result of IP conflict. I'm assuming that you're connecting to an IPv4 address. If so, log in to your CentOS server and use arping to look for conflicts:

# arping -c 2 D -I em1 <your address>

...

Login via Mac, Windows, Linux systems from different computers.

Modify sshd_config on the server as suggested by many posts:

TCPKeepAlive yes ClientAliveInterval 60

TCPKeepAlive is "yes" by default. ClientAliveInterval doesn't appear to be a valid setting. Either TCPKeepAlive or ServerAliveInterval could be useful if the problem were a stateful firewall which was dropping your connection from its state table, and then resetting the connection in response to a later packet from your client.

Since those don't help, that tends to suggest that the problem isn't an intermediate host, but the server itself. Possibly an IP conflict. Also, check the output of "dmesg" to see if there are any problems recorded with the NIC. Check the output of "ifconfig" to see if there are TX or RX errors that increase when your connections are reset.

...

Modify ~/.ssh/config file on my local computer:

Host * ServerAliveInterval 60 4. Login ssh using -Y instead of -X.

You didn't say what client OS you're using, but Fedora and CentOS set ForwardX11Trusted to "yes" by default, so "ssh -Y" is the same as "ssh -X". And even if it weren't, it wouldn't cause the problem you're seeing.

...

add ‘unset autologout’ in my .cshrc.

The error you're seeing won't be triggered by your shell exiting.

...

I checked IP address with the internet administrator, and it works well.

add a file named autologout.csh with ‘set autologout=0’.

Anthony K

9 Oct 9 Oct

1:02 a.m.

On 09/10/15 10:23, Gordon Messmer wrote:

...

Since those don't help, that tends to suggest that the problem isn't an intermediate host, but the server itself. Possibly an IP conflict. Also, check the output of "dmesg" to see if there are any problems recorded with the NIC. Check the output of "ifconfig" to see if there are TX or RX errors that increase when your connections are reset.

As Gordon suggests, let's see if the problem might be related to a dying NIC. The output of the following command may reveal any illness:

# ip -s -d l l

Cheers, ak.

Hua Wang

3:13 a.m.

Dear All,

Attached is my sushi_config file.

In addition, I forgot to say that I added the second raid (raid 5 with 6*3Gb disks) while reinstalling Centos 7. There was only one raid for Centos 6.6 (raid 6 with 6*2Gb disks). So now there are two raids, and both are raid 5.

Thanks.

Hua

Hua Wang

11 Oct 11 Oct

6:25 a.m.

I am not sure if we can not send attachments to the mailing list. There were quite a lot replies before, but I got nothing back since attachements was added. I will remove the attachments and send it again. Please have a look at the email below. Thanks for your help.

---

Dear All,

Thanks for all your help. I will put all the comments together. Please have a look if there is any clue on such ghost problem. I have also attached the log files: dmesg, secure, messages. Please note that there is a message in secure when it exited just now. Oct 9 10:55:55 maya2012 su: pam_unix(su:session): session closed for user root

...

Can you trigger the error reliably by doing something network intensive, like scp or rsync a large file? I've seen similar behaviour with a bad NIC that was in the process of dying.

Yes, I copied tens of Gb files using rsync. It worked well.

...

That's very often a result of IP conflict. I'm assuming that you're connecting to an IPv4 address. If so, log in to your CentOS server and use arping to look for conflicts:

# arping -c 2 D -I em1 <your address>

The IP is fixed to my server. The network administrator has checked the address, and only this computer uses it. When I run the above command line, the output is:

[root@maya2012 hwang]# arping -c 2 -D -I em1 222.200.125.5 ARPING 222.200.125.5 from 0.0.0.0 em1 Sent 2 probes (2 broadcast(s)) Received 0 response(s)

...

...

Login via Mac, Windows, Linux systems from different computers.

Modify sshd_config on the server as suggested by many posts:

TCPKeepAlive yes ClientAliveInterval 60

TCPKeepAlive is "yes" by default. ClientAliveInterval doesn't appear to be a valid setting. Either TCPKeepAlive or ServerAliveInterval could be useful if the problem were a stateful firewall which was dropping your connection from its state table, and then resetting the connection in response to a later packet from your client.

Since those don't help, that tends to suggest that the problem isn't an intermediate host, but the server itself. Possibly an IP conflict. Also, check the output of "dmesg" to see if there are any problems recorded with the NIC. Check the output of "ifconfig" to see if there are TX or RX errors that increase when your connections are reset.

[root@maya2012 hwang]# ifconfig em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 222.200.125.5 netmask 255.255.255.128 broadcast 222.200.125.127 inet6 fe80::d6ae:52ff:fe6a:405e prefixlen 64 scopeid 0x20<link> ether d4:ae:52:6a:40:5e txqueuelen 1000 (Ethernet) RX packets 2865 bytes 396191 (386.9 KiB) RX errors 0 dropped 180 overruns 0 frame 0 TX packets 510 bytes 55844 (54.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

em2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether d4:ae:52:6a:40:5f txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 0 (Local Loopback) RX packets 7 bytes 748 (748.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 7 bytes 748 (748.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

[root@maya2012 hwang]# ip -s -d l l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 RX: bytes packets errors dropped overrun mcast 748 7 0 0 0 0 TX: bytes packets errors dropped carrier collsns 748 7 0 0 0 0 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether d4:ae:52:6a:40:5e brd ff:ff:ff:ff:ff:ff promiscuity 0 RX: bytes packets errors dropped overrun mcast 312908 2272 0 138 0 1081 TX: bytes packets errors dropped carrier collsns 43946 403 0 0 0 0 3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000 link/ether d4:ae:52:6a:40:5f brd ff:ff:ff:ff:ff:ff promiscuity 0 RX: bytes packets errors dropped overrun mcast 0 0 0 0 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0

Thanks,

Hua

Gordon Messmer

12 Oct 12 Oct

5:10 p.m.

On 10/10/2015 11:25 PM, Hua Wang wrote:

...

I am not sure if we can not send attachments to the mailing list. There were quite a lot replies before, but I got nothing back since attachements was added. I will remove the attachments and send it again.

You can use services like pastebin.com to temporarily post your logs. I wouldn't recommend posting the whole "secure" log. The output of dmesg might be helpful, but you can probably just read it and determine whether or not there's anything related to your NIC or to networking.

arping and ifconfig don't show any conflicts or errors. It's still possible that there's a conflict with a device that's not online all of the time, but that'll be hard to track down.

At this point, I think we've exhausted a lot of the simple stuff. The next thing I'd do would be to run tcpdump on your client and watch all of the traffic to and from the server, and any ICMP. When the connection is interrupted, the last few packets should show the cause. I'd expect you to see either a TCP reset or one of a few ICMP messages. So, open a terminal, start a tcpdump, and let it run until your ssh connection (in another terminal, obviously) is reset. Use Ctrl+C to stop tcpdump.

# tcpdump -nn host 222.200.125.5 or icmp

3595

Age (days ago)

3599

Last active (days ago)

discuss@lists.centos.org

18 comments

9 participants

tags (0)

participants (9)

Anthony K
Frank Cox
Gordon Messmer
Hua Wang
Johnny Hughes
Jonathan Billings
Kahlil Hodgson
Leon Fauster
zep