[CentOS] problem on exceptional quit

Sun Oct 11 06:25:18 UTC 2015
Hua Wang <ehwang at 163.com>

I am not sure if we can not send attachments to the mailing list. There were quite a lot replies before, but I got nothing back since attachements was added. I will remove the attachments and send it again. Please have a look at the email below. Thanks for your help.

---

Dear All,

Thanks for all your help. I will put all the comments together. Please have a look if there is any clue on such ghost problem. I have also attached the log files: dmesg, secure, messages. Please note that there is a message in secure when it exited just now.
Oct  9 10:55:55 maya2012 su: pam_unix(su:session): session closed for user root

> Can you trigger the error reliably by doing something network intensive, like scp or rsync a large file?  I've seen similar behaviour with a bad NIC that was in the process of dying.

Yes, I copied tens of Gb files using rsync. It worked well.

> That's very often a result of IP conflict.  I'm assuming that you're connecting to an IPv4 address.  If so, log in to your CentOS server and use arping to look for conflicts:
> 
> # arping -c 2 D -I em1 <your address>

The IP is fixed to my server. The network administrator has checked the address, and only this computer uses it. When I run the above command line, the output is:

[root at maya2012 hwang]# arping -c 2 -D -I em1 222.200.125.5
ARPING 222.200.125.5 from 0.0.0.0 em1
Sent 2 probes (2 broadcast(s))
Received 0 response(s)

>> 1. Login via Mac, Windows, Linux systems from different computers.
>> 2. Modify sshd_config on the server as suggested by many posts:
>> TCPKeepAlive yes
>> ClientAliveInterval 60
> 
> TCPKeepAlive is "yes" by default.  ClientAliveInterval doesn't appear to be a valid setting.  Either TCPKeepAlive or ServerAliveInterval could be useful if the problem were a stateful firewall which was dropping your connection from its state table, and then resetting the connection in response to a later packet from your client.
> 
> Since those don't help, that tends to suggest that the problem isn't an intermediate host, but the server itself.  Possibly an IP conflict.  Also, check the output of "dmesg" to see if there are any problems recorded with the NIC.  Check the output of "ifconfig" to see if there are TX or RX errors that increase when your connections are reset.

[root at maya2012 hwang]# ifconfig
em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 222.200.125.5  netmask 255.255.255.128  broadcast 222.200.125.127
        inet6 fe80::d6ae:52ff:fe6a:405e  prefixlen 64  scopeid 0x20<link>
        ether d4:ae:52:6a:40:5e  txqueuelen 1000  (Ethernet)
        RX packets 2865  bytes 396191 (386.9 KiB)
        RX errors 0  dropped 180  overruns 0  frame 0
        TX packets 510  bytes 55844 (54.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

em2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether d4:ae:52:6a:40:5f  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 7  bytes 748 (748.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7  bytes 748 (748.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root at maya2012 hwang]#  ip -s -d l l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 
    RX: bytes  packets  errors  dropped overrun mcast   
    748        7        0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    748        7        0       0       0       0      
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether d4:ae:52:6a:40:5e brd ff:ff:ff:ff:ff:ff promiscuity 0 
    RX: bytes  packets  errors  dropped overrun mcast   
    312908     2272     0       138     0       1081   
    TX: bytes  packets  errors  dropped carrier collsns 
    43946      403      0       0       0       0      
3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 1000
    link/ether d4:ae:52:6a:40:5f brd ff:ff:ff:ff:ff:ff promiscuity 0 
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0      

Thanks,

Hua