Update on this problem:
From another system, I initiated a constant ping on my laggy server.
I noticed that every 10--20 seconds, one or more ICMP packets would drop. These drops were consistent with the input lag I was experiencing.
I did a web search for "linux periodically hangs" and found this Serverfault post that had a lot in common with my symptoms:
http://serverfault.com/questions/371666/linux-bonded-interfaces-hanging-peri...
I in fact have bonded interfaces on the laggy server. When I checked the bonding config, I realized a while ago I had changed from balance-rr / mode 0, to 802.3ad / mode 4. (I did this because I kept getting "bond0: received packet with own address as source address" when using balance-rr with a bridge interface. The bridge interface was for using KVM.)
For now, I simply disabled one of the slave interfaces, and the lag / dropped ICMP packets problem has gone away.
Like the Serverfault poster, I have an HP TrueCurve 1800-24g switch. The switch is supposed to support 802.3ad link aggregation. It's not a managed switch, so I (perhaps incorrectly) assumed that 802.3ad would magically just work. Either there is more required to make it work, or it's implementation is broken. Curiously, however, running my bond0 in 802.3ad mode did work without any issue for over a month.
Anyway, hopefully this might help someone else struggling with a similar problem.
On Fri, Oct 10, 2014 at 4:17 PM, Matt Garman matthew.garman@gmail.com wrote:
On Fri, Oct 10, 2014 at 4:11 PM, Joseph L. Brunner joe@affirmedsystems.com wrote:
If this is a server - is it possible your raid card battery died?
It is a server, but a home file server. The raid card has no battery backup, and in fact has been flashed to pure HBA mode. Actual RAID'ing is done at the software level.
The only other thing on the hardware side that comes to mind is actual bad sectors if this is not a raided virtual drive.
The system has eight total drives: two SSDs in raid-1 for the OS, five 3.5 spinning drives in RAID-6, and a single 3.5 drive normally used for mythtv recordings (though mythtv has been stopped for a long time now to try to debug the issue).
From the OS side can you keep the box up long enough to do a yum update?
Yes, I updated everything except packages beginning with "l" ("el" / lowercase 'L') due to that generating a number of conflicts that I haven't have time to resolve.