On Fri, Mar 28, 2014 at 9:01 AM, Mr Queue <lists at mrqueue.com> wrote: > On Thu, 27 Mar 2014 17:20:22 -0500 > Matt Garman <matthew.garman at gmail.com> wrote: > > > Anyone seen anything like this? Any thoughts or ideas? > > Post some data.. This public facing? Are you getting sprayed down by > packets? Array? Soft/hard? Someone have screens > laying around? Write a trap to catch a process list when the loads spike? > Look at crontab(s)? User accounts? Malicious > shells? Any guest containers around? Possibilities are sort of endless > here. > Not public facing (no Internet access at all). Linux software RAID-1. No screen or tmux data. No guest access of any kind. In fact, only three logged in users. I've reviewed crontabs (there are only a couple), and I don't see anything out of the ordinary. Malicious shells or programs: possibly, but I think that is highly unlikely... if someone were going to do something malicious, *this* particular server is not the one to target. What kind of data would help? I have sar running at a five second interval. I also did a 24-hour run of dstat at a one second interval collecting all information it could. I have tons of data, but not sure how to "distill" it down to a mailing-list friendly format. But a colleague and I reviewed the data, and don't see any correlation with other system data before, during, or after these load spike events. I did a little research on the loadavg number, and my understanding is that it's simply a function of the number of tasks on the system. (There's some fancy stuff thrown in for exponential decay and curve smoothing and all that, but it's still based on the number of system tasks.) I did a simple run of "top -b > top_output.txt" for a 24-hour period, which captured another one of these events. I haven't had a chance to study it in detail, but I expected the number of tasks to shoot up dramatically around the time of these load spikes. The number of tasks remained fairly constant: about 200 +/- 5. How can the loadavg shoot up (from ~1 to ~20) without a corresponding uptick in number of tasks?