On 01/21/2015 08:49 AM, Glenn Eychaner wrote: > Diagnosis: > the previous behavior of > receiving a 0-length recv() on the old server socket is unsupported and > unreliable. You mention that a lot, and it might help to understand why that happens. A 0 length recv() on a standard (blocking) socket indicates end-of-file. The remote side has closed the connection. What you were previously seeing was the client sending SYN to establish a new connection. Because it was unrelated to the existing connection on the same 5-tuple, the server's TCP stack closed the existing socket. I'm not positive, but the server may have sent a keepalive or other probe to the client and got a RST. Either way, the kernel determined that the socket had been closed by the client, and a 0-length read (recv) is the way that the kernel informs an application of that closure. > Until the update to CentOS 6.6 'broke' the existing functionality, > I had never looked deeply into the connection between the client and the > server; it 'just worked', so I left it alone. Once it did break, I realized > that because the client was connecting on the same port every time, the > whole setup might have been relying on unsupported behavior. Not just unsupported, but incorrect. Unrelated packets with a 5-tuple matching an established socket are typically injection attacks. TCP is supposed to discard them. > Other diagnostics: > One test I intend to run in a couple of weeks (next opportunity) is to boot > the CentOS 6.6 box with the older kernel, in order to find out whether the > behavior change is in the kernel or in the libraries. It's always good to test, but it's almost certainly the kernel. Libraries don't decide whether or not a socket has closed, which is what the 0-length read (recv) indicates. > Correct solutions: > 1) Client port: The client should be connecting on a random, ephemeral port Yes. > 2) Protocol change: The server never writes to the socket in the existing > protocol, and can therefore never find out that the connection is dead. > Writing to the socket would reveal this. But what happens if the server writes > to the socket, and the client never reads? You will eventually fill up a buffer on one side or the other, and at that point any further write (send) will block forever. > 3) Several people suggested using SO_REUSEADDR and/or an SO_LINGER of zero to > drop the socket out of TIME_WAIT, but does the socket enter TIME_WAIT as soon > as the client crashes? I didn't think so, but I may be wrong. No. It enters TIME_WAIT when the socket closes. If the socket were closing, you'd be getting a 0-length read (recv). You can confirm that with "netstat"