Re: [CentOS] High load averages with latest kernel and USB drives?

17 Nov 2009


      See comments below...
On Tuesday 17 November 2009 07:52:01 Todd Denniston wrote:
...
Benjamin Smith wrote, On 11/16/2009 10:56 PM:
...
I have a 1TB USB drive plugged into a USB2 port that I use to back up the
production drives (which are SCSI). It's working fine, but while doing
backups (hourly) the load average on the server shoots up from the normal
0.5 - 1.5 or so up to a high between 10 and 30. Strangely, even though
the "load is high" the server is completely responsive, even the USB
drives being accessed are!
Backup script is really simple, run via cron, pretty much just:
#! /bin/sh
hour=`date +%k`;
pg_dump <options> mydatabase > /media/backups/mydatabase.$hour.pgsql;
where /media/backups is the mount point for the USB drive.
Using top to diagnose, nothing seems to be particularly high! IoWait
seems reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even
accessing the USB partition while the load is "high" is responsive!
I'm guessing that something changed in how load average is counted?
Server Stats:
   Late model 8-way Xeon, SuperMicro brand.
   CentOS 4.x  / 64 (all updates applied, booted after last kernel update)
   Kernel 2.6.9-89.0.16.ELsmp
   4 GB ECC RAM
   300 GB SCSI HDD.
   Standard Apache/PHP, Postgres 8.4.
Any idea how to revert to the old load average tracking behavior short of
using a stale and potentially insecure kernel?
...
Are you saying that when you were running a previous kernel the same
 operations with the same devices did not have the high load?
Correct!
...
Which
 specific kernels worked as desired (if someone is going to bisect the
 problem they need a start point)?
kernel-smp-devel-2.6.9-89.0.15.EL  (I always keep my machines updated on at 
least a weekly scheduule)
...
Are there other processes on the machine that are waiting to use the db
 while the dump is occurring?
No. Database is actually on a different machine and backups are being done over 
the network.
...
How many postgres processes are waiting for
 the dump to finish (it has been a while since I ran postgres so I don't
 recall how it deals with query's during a dump)?
One - the one performing the backup. Postgres uses MVCC so pg_dump doesn't 
block any other connections from continuing/finishing.
...
As workarounds perhaps asking the kernel to schedule in a specific way
 might help, i.e.: #1 set the backup on a particular set of processors,
#  replace the pg_dump line above with
taskset -c 3-4 pg_dump <options> mydatabase > \
   /media/backups/mydatabase.$hour.pgsql;
There are 8 cores on the machine, none of which are reporting more than 5% 
load. That's what has me perplexed. When I run top, I see a max of about 30% 
user. Everything else is zero. When I run the backup script to a non-USB 
drive, the load average is completely normal (below 0.50, often below 0.10)
...
#2 set the usb-storage on a particular set of processors,
# Note USBSTORPID= line prototyped on CentOS 5 machine not 4.
USBSTORPID=`ps aux |grep usb-storage|head -1 |awk '{print $2}'`
taskset -p -c 3-4 $USBSTORPID
#you might even go back and reduce the processor list
#to just 3 or 4 instead of both.
Could you explain to me what this should accomplish? I'm curious as to why you 
went this route...
...
#3 don't update atime
# (should at worst be a minor thing, and you say that
# the usb mounted file system is responsive,
# but perhaps it would help some.)
mount -oremount,noatime /media/backups/
Already mounted noatime... here's the mount line in the backup script: 
# mount -o rw,noatime -t ext3 /dev/sdc1 /home/backup/localdb/
-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] High load averages with latest kernel and USB drives?