[CentOS] High load averages with latest kernel and USB drives?
Benjamin Smith
lists at benjamindsmith.com
Tue Nov 17 18:46:37 UTC 2009
See comments below...
On Tuesday 17 November 2009 07:52:01 Todd Denniston wrote:
> Benjamin Smith wrote, On 11/16/2009 10:56 PM:
> > I have a 1TB USB drive plugged into a USB2 port that I use to back up the
> > production drives (which are SCSI). It's working fine, but while doing
> > backups (hourly) the load average on the server shoots up from the normal
> > 0.5 - 1.5 or so up to a high between 10 and 30. Strangely, even though
> > the "load is high" the server is completely responsive, even the USB
> > drives being accessed are!
> >
> > Backup script is really simple, run via cron, pretty much just:
> >
> > #! /bin/sh
> > hour=`date +%k`;
> > pg_dump <options> mydatabase > /media/backups/mydatabase.$hour.pgsql;
> >
> > where /media/backups is the mount point for the USB drive.
> >
> > Using top to diagnose, nothing seems to be particularly high! IoWait
> > seems reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even
> > accessing the USB partition while the load is "high" is responsive!
> >
> > I'm guessing that something changed in how load average is counted?
> >
> > Server Stats:
> > Late model 8-way Xeon, SuperMicro brand.
> > CentOS 4.x / 64 (all updates applied, booted after last kernel update)
> > Kernel 2.6.9-89.0.16.ELsmp
> > 4 GB ECC RAM
> > 300 GB SCSI HDD.
> > Standard Apache/PHP, Postgres 8.4.
> >
> > Any idea how to revert to the old load average tracking behavior short of
> > using a stale and potentially insecure kernel?
> Are you saying that when you were running a previous kernel the same
> operations with the same devices did not have the high load?
Correct!
> Which
> specific kernels worked as desired (if someone is going to bisect the
> problem they need a start point)?
kernel-smp-devel-2.6.9-89.0.15.EL (I always keep my machines updated on at
least a weekly scheduule)
> Are there other processes on the machine that are waiting to use the db
> while the dump is occurring?
No. Database is actually on a different machine and backups are being done over
the network.
> How many postgres processes are waiting for
> the dump to finish (it has been a while since I ran postgres so I don't
> recall how it deals with query's during a dump)?
One - the one performing the backup. Postgres uses MVCC so pg_dump doesn't
block any other connections from continuing/finishing.
> As workarounds perhaps asking the kernel to schedule in a specific way
> might help, i.e.: #1 set the backup on a particular set of processors,
> # replace the pg_dump line above with
> taskset -c 3-4 pg_dump <options> mydatabase > \
> /media/backups/mydatabase.$hour.pgsql;
There are 8 cores on the machine, none of which are reporting more than 5%
load. That's what has me perplexed. When I run top, I see a max of about 30%
user. Everything else is zero. When I run the backup script to a non-USB
drive, the load average is completely normal (below 0.50, often below 0.10)
> #2 set the usb-storage on a particular set of processors,
> # Note USBSTORPID= line prototyped on CentOS 5 machine not 4.
> USBSTORPID=`ps aux |grep usb-storage|head -1 |awk '{print $2}'`
> taskset -p -c 3-4 $USBSTORPID
> #you might even go back and reduce the processor list
> #to just 3 or 4 instead of both.
Could you explain to me what this should accomplish? I'm curious as to why you
went this route...
> #3 don't update atime
> # (should at worst be a minor thing, and you say that
> # the usb mounted file system is responsive,
> # but perhaps it would help some.)
> mount -oremount,noatime /media/backups/
Already mounted noatime... here's the mount line in the backup script:
# mount -o rw,noatime -t ext3 /dev/sdc1 /home/backup/localdb/
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the CentOS
mailing list