On Thu, 2007-12-06 at 16:48 +0100, Tomasz 'Zen' Napierala wrote: > Wednesday 05 December 2007 15:39:41 J. Potter napisał(a): > > Hi List, > > > > I'm stumped by this: > > > > load average: 10.65, 594.71, 526.58 > > > > We're monitoring load every ~3 minutes. It'll be fine (i.e. something > > like load average: 2.14, 1.27, 1.03), and then in a single sample, > > jump to something like the above. This seems to happen once a week or > > so on a few different servers (all running in a similar application). > > I've never seen the 1 minute sample spike as high as the 5 or 15 > > minute samples. > > > > Seeing as that last value is a 15 minute period, well, it doesn't seem > > possible that one can have a 500+ 15 minute sample without having > > observed a spike in the 5 minute sample at least 5 minutes before. > > > > Also, there aren't 500+ processes on these systems -- it's typically > > around 100 total processes (ps auxw | wc -l). (Is there a way to see > > the total count of kernel-level threads?) > > > > Thoughts? > > As mentioned before, IO could give such strange results. I suggest launching > dstat with logging to a file, and analyzing the file afterwards. > What about using sar to report the previous run queue history. AFAIK the run queue figures don't include processes in an uninterruptable sleep state (disk IO).