On 6/30/11, Devin Reade gdr@gno.org wrote:
I don't recall you mentioning which VM solution you're using.
KVM :)
Some problematic areas that I've seen when using VMs:
- memory ballooning sometimes causes problems (I've not actually seen it, but I've seen various warnings about having it enabled and resultant flakiness, and I run with it disabled)
This might be one of the problems, because I just realized while the swap used is still pretty small at around 200MB, it's about 5x the "normal" amount of about 40MB. But since I set an initial 1GB with an upper limit of 1.5GB, I'll expect the amount of memory available to be 1.5GB at least when swap usage goes up. However, this isn't the case, the ballooning doesn't seem to be happening so maybe that's part of the problem: one of them just wanted to use a bit more memory for whatsoever reasons but didn't get it and start hitting swap and the i/o starts going crazy.
I/O stacks not doing TCP segment offload correctly. This is an ugly one that bit me hard and took a while to track down. It's happened in both ESXi and Xen (and I'm not saying that KVM isn't affected, either).
The symptoms of this is things seem to be fine under low load, but as network traffic starts to increase TCP sessions start stalling out or dying. I've seen it to the point where I can't even maintain an ssh session long enough to get a login prompt.
This might be possible but at the moment I'll consider it unlikely since the problem don't usually happen during low load periods i.e. not when the users are connecting to the email or app service during working hours.
So I'll KIV this first and see if simply setting the max/current memory without relying on ballooning works.