Filipe:
I changed the firewall rules on the server that had stopped responding to not use ESTABLISHED.
Now, one of the servers that was still using ESTABLISHED stopped responding.
I am seeing logs like this in the syslog:
OUTPUT IN= OUT=eth0 SRC=[myIP] DST=[otherIP] LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=35076 DF PROTO=TCP SPT=80 DPT=36953 WINDOW=54 RES=0x00 ACK PSH FIN URGP=0
I did: cat /proc/sys/net/ipv4/netfilter/ip_conntrack_count and it gave me: 615
That seems like the conntrack is not overflowing, but the firewall was blocking the outbound traffic.
I updated all my servers to not use ESTABLISHED, but I am still baffled on how this could occur.
Any other ideas?
Thanks, Neil
-- Neil Aggarwal, (832)245-7314, www.JAMMConsulting.com Eliminate junk email and reclaim your inbox. Visit http://www.spammilter.com for details.
You are right that your conntrack table size is high enough and this should not be happening. It might be an attack, a synflood or something, that is causing this problem to happen. In that case, the semi-opened connections will be kept on the table, but as the other side will not complete the handshake, they will only be removed from the table after a timeout. I also think that when you stop Apache, there will be no process listening on port 80 anymore, and then conntrack may get rid of those semi-opened connections since the other side is not listening anymore. A lot of especulation here, but it might be what is affecting you.
In any case, next time you have this same problem, considering looking at the counters to see if _count is reaching _max, that would confirm the hypothesis.