[CentOS] Socket behavior change from 6.5 to 6.6

Fri Jan 16 17:42:36 UTC 2015
Warren Young <wyml at etr-usa.com>

On Jan 15, 2015, at 11:40 AM, Glenn Eychaner <geychaner at mac.com> wrote:

> When the DOS box exits, crashes, or is rebooted, it fails to shut down the
> socket properly.

Yes, that’s what happens when you use an OS that doesn’t implement sockets in kernel space: there is no program still running that can send the RST packet for the dead socket.

> Under CentOS 6.5, upon reboot, when the DOS box would attempt
> to reconnect, the original accepted server socket would (after a couple of
> connection attempts from the DOS box) see a 0-length recv and close, allowing
> the server to accept a new connection and resume receiving images.

You’re relying on undocumented behavior here.

I don’t know exactly what was going on before [*] but the new behavior is at least legal, and probably better.  It is preventing a bogus reconnection, which could be used for nefarious purposes.  (Connection hijacking, etc.)

[*] Your “flailing about” diagnosis is somewhat lacking in its level of rigor. :)  I think if you look more deeply into it, you’ll be shocked at how thin the ice you’ve been dancing on is.

> Possibly relevant facts:

Oh, yeah.  Relevant like rashes are to a diagnosis of chicken pox.

> - The DOS box uses the same local port (1025) every time it tries to connect.

That’s legal only if you allow the previous connection to die first, via the TIME_WAIT delay.  Until that delay expires, the connection’s 5-tuple [**] is still in use, and the kernel is right to refuse to accept another SYN using the same 5-tuple.

Another poster recommended SO_REUSEADDR, but that’s just a hack around the TIME_WAIT delay.

The correct fix is to change the DOS app to use an ephemeral port number.  That won’t 100% fix it, because you’ll still have a 1:16,383 chance [***] of causing the same problem as you’ve run into now, but that sounds live-able to me.  If you reboot only once a week, you’d have to be Yoda to have much reason to be worried about running into this again during the balance of your tenure with this company.

If you’re really worried about it, write the prior port to a text file on program startup and avoid that one on the next run.

Oh, let me guess the objection: old binary-only DOS app, no source code available, programmers long since vanished, right?

[**] Transport protocol, local port, local IP, remote port, remote IP.  At least one must be different for a new connection to be allowed.

[***] The IANA ephemeral port range (https://en.wikipedia.org/wiki/Ephemeral_port) has about 16k ports.  I spent some time puzzling over the probabilities, and I’m pretty sure you don’t count two “draws” here: you’re only concerned with the chance that the *next* port you pick will be equal to the preceding one.