Hi,
I was asked to check some TIME_WAITs "problems" (my boss thinks there should almost never be any) and I bumped into something strange... All of our servers have apparently normal (in my opinion) 60s TIME_WAITs (even if it strangely caps around 14000 in my tests)... But one of them behaves differently (and my boss thinks it is the normal behavior). If I make 10000 rapid connections/selects/deconnections to mysql on this server, I get like 1 TW after around 3000, another TW around 6000 and another TW around 9000... That makes 3 TWs only. And they last 60 seconds... I am told this server was not really setup differently (no custom kernel). All servers are CentOS 5.2, kernels 2.6.18-92.1.[22|10].el5. I compared the values in sysctl.conf and /proc/sys/net/ipv4/* and nothing different.
So, am I correct in thinking that seeing thousands TWs when there was a burst of thousands connections is normal? Any idea why so few TWs on this server? Any conf file I should check?
# cat /proc/sys/net/ipv4/tcp_fin_timeout 60 # cat /proc/sys/net/ipv4/tcp_max_tw_buckets 180000 # cat /proc/sys/net/ipv4/tcp_tw_recycle 0 # cat /proc/sys/net/ipv4/tcp_tw_reuse 0
When I googled for it, many people were pointing to the tcp_fin_timeout value ... Is it really related to TWs?
Thx, JD
John Doe wrote:
Hi,
I was asked to check some TIME_WAITs "problems" (my boss thinks there should almost never be any) and I bumped into something strange...
I SHOULD be able to answer this, I was involved when we solved the PANIX TCP-WAIT attack way back when...
But the OS has changed since then, and I don't work with internals.
All of our servers have apparently normal (in my opinion) 60s TIME_WAITs (even if it strangely caps around 14000 in my tests)... But one of them behaves differently (and my boss thinks it is the normal behavior). If I make 10000 rapid connections/selects/deconnections to mysql on this server, I get like 1 TW after around 3000, another TW around 6000 and another TW around 9000... That makes 3 TWs only. And they last 60 seconds... I am told this server was not really setup differently (no custom kernel). All servers are CentOS 5.2, kernels 2.6.18-92.1.[22|10].el5. I compared the values in sysctl.conf and /proc/sys/net/ipv4/* and nothing different.
So, am I correct in thinking that seeing thousands TWs when there was a burst of thousands connections is normal? Any idea why so few TWs on this server? Any conf file I should check?
In your testing is the source IP the same for all with just different source port? Or are you varying your source IP as well? I don't know what spoofing smarts are in the kernel to detect SYN/ACK attacks.
Are you running Shorewall or any similar tool that will detect SYN/ACK attacks and might be seeing this 'test' as an attack to limit?
# cat /proc/sys/net/ipv4/tcp_fin_timeout 60 # cat /proc/sys/net/ipv4/tcp_max_tw_buckets 180000
I remember when this was 256. And that on a 'high-end' SUN, AIX, or SGI server! With that many buckets it will take a lot of TW before random discard kicks in. I wonder what the threshhold is?
# cat /proc/sys/net/ipv4/tcp_tw_recycle 0 # cat /proc/sys/net/ipv4/tcp_tw_reuse 0
When I googled for it, many people were pointing to the tcp_fin_timeout value ... Is it really related to TWs?
Well, yes. How long do you let a TW sit around waiting for a proper FIN or even a RST? Read the TCP RFC as to why there is a TW in the state machine. Boy has it been years since I cracked that one open...
It is really a resource issue. If you have gobs of memory, what do you care if 20% of your TCBs are tied up in TW? Or if 50% are tied up? Eventually they close out. There are CPU requirements to track all of these as well, but these tasks tend to run using idle time.
If you are seeing memory or cpu related bottlenecks, tools like Shorewall can block rapid connects.
If I make 10000 rapid connections/selects/deconnections to mysql on this server, I get like 1 TW after around 3000, another TW around 6000 and another TW around 9000... That makes 3 TWs only. And they last 60 seconds...
In your testing is the source IP the same for all with just different source port? Or are you varying your source IP as well? I don't know what spoofing smarts are in the kernel to detect SYN/ACK attacks.
The source was the same on both servers (the one with thousands of TWs and the one with 3 TWs).
Are you running Shorewall or any similar tool that will detect SYN/ACK attacks and might be seeing this 'test' as an attack to limit?
No shorewall and no iptables rules.
When I googled for it, many people were pointing to the tcp_fin_timeout value
... Is it really related to TWs? Well, yes. How long do you let a TW sit around waiting for a proper FIN or even a RST? Read the TCP RFC as to why there is a TW in the state machine. Boy has it been years since I cracked that one open...
I read about the connection handshake but I do not really see why setting the FIN_WAIT timeout would also set the TIME_WAIT timeout to the same value... And I tried to set it at 30s and TWs did still last 60s.
Thx, JD
John Doe wrote:
So, am I correct in thinking that seeing thousands TWs when there was a burst of thousands connections is normal?
yes that is normal
Any idea why so few TWs on this server? Any conf file I should check?
I spent a bunch of time researching TIME_WAIT on linux and didn't find much useful information. There's a couple kernel parameters to change the settings though the only docs for them that I could find say don't touch them unless you REALLY know what your doing (I posted to this list about 6 months ago on the topic, don't recall any responses).
[root@webserver30 ~]# netstat -an | grep -c TIME_WAIT 12840
The app that runs on that box is very high volume, so we get a large number of TIME_WAITs, during performance testing on a dual proc quad core we can get up to 63,000 of them. The typical lifespan of a transaction/connection on the above system is about 200ms, and it receives several hundred of those per second(multiply by about 90 servers).
The solution for us(to drive more throughput as we maxed out the sockets before maxing out the hardware) was to enable connection pooling on our load balancers to re-use connections, cut socket usage by 10x at least. The above system is dual proc single core so we don't max out the sockets before maxing out the CPU or disk I/O in that case. Though it is kind of strange with connection pooling turned on, the load balancer spoofs the remote IPs, so IP #1 would come in, establish a connection, and then IP #2-999 would then re-use that connection but netstat would show IP #1 the whole time. The load balancer forwards the actual IP from the source in a HTTP header so the application can be aware of it.
So IMO don't worry about time waits unless your seriously in the 10s of thousands, at which point you may want to think about optimizing the traffic flow to your systems like we did with our load balancers.
A few thousand time waits? not worth any time investigating.
nate
I spent a bunch of time researching TIME_WAIT on linux and didn't find much useful information. There's a couple kernel parameters to change the settings though the only docs for them that I could find say don't touch them unless you REALLY know what your doing
Only things I found are the hardcoded values in include/net/tcp.h:
#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT * state, about 60 seconds */ #define TCP_FIN_TIMEOUT TCP_TIMEWAIT_LEN /* BSD style FIN_WAIT2 deadlock breaker. * It used to be 3min, new value is 60sec, * to combine FIN-WAIT-2 timeout with * TIME-WAIT timer. */
Our "issue" is on the LAN side: front servers connecting to the dbs. So I wonder if 60s is not too long for the delayed packets problem, when the sources and the targets are one gigabit switch away...
The app that runs on that box is very high volume, so we get a large number of TIME_WAITs, during performance testing on a dual proc quad core we can get up to 63,000 of them.
Hum... I think I just understood why I cap around 14,000 in my tests... cat /proc/sys/net/ipv4/ip_local_port_range 32768 61000 (61000-32768)/2 = 14116 Could that be it?
So IMO don't worry about time waits unless your seriously in the 10s of thousands, at which point you may want to think about optimizing the traffic flow to your systems like we did with our load balancers.
We already use LVS+keepalived and it seems to work fine so far (except when I tested 1.1.16 ^_^).
Thx, JD
John Doe wrote:
Only things I found are the hardcoded values in include/net/tcp.h:
I found these tunable parameters: tcp_tw_recycle & tcp_tw_reuse
Our "issue" is on the LAN side: front servers connecting to the dbs. So I wonder if 60s is not too long for the delayed packets problem, when the sources and the targets are one gigabit switch away...
Your front end servers should be using connection pooling to go to the DBs, so there is no delay in having to establish a connection. Of course connection pooling isn't foolproof I've seen a bunch of cases where it doesn't work as advertised..
Hum... I think I just understood why I cap around 14,000 in my tests... cat /proc/sys/net/ipv4/ip_local_port_range 32768 61000 (61000-32768)/2 = 14116 Could that be it?
I don't think so, my settings are the same, and have no problem getting to 60k+ TIME_WAITs.
nate