[CentOS] Socket behavior change from 6.5 to 6.6

Wed Jan 21 18:09:55 UTC 2015
Gordon Messmer <gordon.messmer at gmail.com>

On 01/21/2015 08:49 AM, Glenn Eychaner wrote:
> Diagnosis:
> the previous behavior of
> receiving a 0-length recv() on the old server socket is unsupported and
> unreliable.

You mention that a lot, and it might help to understand why that happens.

A 0 length recv() on a standard (blocking) socket indicates end-of-file. 
  The remote side has closed the connection.

What you were previously seeing was the client sending SYN to establish 
a new connection.  Because it was unrelated to the existing connection 
on the same 5-tuple, the server's TCP stack closed the existing socket. 
  I'm not positive, but the server may have sent a keepalive or other 
probe to the client and got a RST.  Either way, the kernel determined 
that the socket had been closed by the client, and a 0-length read 
(recv) is the way that the kernel informs an application of that closure.

> Until the update to CentOS 6.6 'broke' the existing functionality,
> I had never looked deeply into the connection between the client and the
> server; it 'just worked', so I left it alone. Once it did break, I realized
> that because the client was connecting on the same port every time, the
> whole setup might have been relying on unsupported behavior.

Not just unsupported, but incorrect.  Unrelated packets with a 5-tuple 
matching an established socket are typically injection attacks.  TCP is 
supposed to discard them.

> Other diagnostics:
> One test I intend to run in a couple of weeks (next opportunity) is to boot
> the CentOS 6.6 box with the older kernel, in order to find out whether the
> behavior change is in the kernel or in the libraries.

It's always good to test, but it's almost certainly the kernel. 
Libraries don't decide whether or not a socket has closed, which is what 
the 0-length read (recv) indicates.

> Correct solutions:
> 1) Client port: The client should be connecting on a random, ephemeral port

Yes.

> 2) Protocol change: The server never writes to the socket in the existing
> protocol, and can therefore never find out that the connection is dead.
> Writing to the socket would reveal this. But what happens if the server writes
> to the socket, and the client never reads?

You will eventually fill up a buffer on one side or the other, and at 
that point any further write (send) will block forever.

> 3) Several people suggested using SO_REUSEADDR and/or an SO_LINGER of zero to
> drop the socket out of TIME_WAIT, but does the socket enter TIME_WAIT as soon
> as the client crashes? I didn't think so, but I may be wrong.

No.  It enters TIME_WAIT when the socket closes.  If the socket were 
closing, you'd be getting a 0-length read (recv).  You can confirm that 
with "netstat"