John Doe wrote:
So, am I correct in thinking that seeing thousands TWs when there was a burst of thousands connections is normal?
yes that is normal
Any idea why so few TWs on this server? Any conf file I should check?
I spent a bunch of time researching TIME_WAIT on linux and didn't find much useful information. There's a couple kernel parameters to change the settings though the only docs for them that I could find say don't touch them unless you REALLY know what your doing (I posted to this list about 6 months ago on the topic, don't recall any responses).
[root@webserver30 ~]# netstat -an | grep -c TIME_WAIT 12840
The app that runs on that box is very high volume, so we get a large number of TIME_WAITs, during performance testing on a dual proc quad core we can get up to 63,000 of them. The typical lifespan of a transaction/connection on the above system is about 200ms, and it receives several hundred of those per second(multiply by about 90 servers).
The solution for us(to drive more throughput as we maxed out the sockets before maxing out the hardware) was to enable connection pooling on our load balancers to re-use connections, cut socket usage by 10x at least. The above system is dual proc single core so we don't max out the sockets before maxing out the CPU or disk I/O in that case. Though it is kind of strange with connection pooling turned on, the load balancer spoofs the remote IPs, so IP #1 would come in, establish a connection, and then IP #2-999 would then re-use that connection but netstat would show IP #1 the whole time. The load balancer forwards the actual IP from the source in a HTTP header so the application can be aware of it.
So IMO don't worry about time waits unless your seriously in the 10s of thousands, at which point you may want to think about optimizing the traffic flow to your systems like we did with our load balancers.
A few thousand time waits? not worth any time investigating.
nate