[CentOS] High load averages with latest kernel and USB drives?

Benjamin Smith wrote, On 11/17/2009 01:46 PM:
> See comments below... 
> 
> On Tuesday 17 November 2009 07:52:01 Todd Denniston wrote:
>> Benjamin Smith wrote, On 11/16/2009 10:56 PM:
>>> I have a 1TB USB drive plugged into a USB2 port that I use to back up the
>>> production drives (which are SCSI). It's working fine, but while doing
>>> backups (hourly) the load average on the server shoots up from the normal
>>> 0.5 - 1.5 or so up to a high between 10 and 30. Strangely, even though
>>> the "load is high" the server is completely responsive, even the USB
>>> drives being accessed are!
>>>
>>> Using top to diagnose, nothing seems to be particularly high! IoWait
>>> seems reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even
>>> accessing the USB partition while the load is "high" is responsive!
>>>
> 

you might add another field to top while you are watching, Last used cpu (SMP), i.e.,
start top
press f
press j
press enter

this should let you see if your process is bouncing between processors.

>> As workarounds perhaps asking the kernel to schedule in a specific way
>>  might help, i.e.: #1 set the backup on a particular set of processors,
>> #  replace the pg_dump line above with
>> taskset -c 3-4 pg_dump <options> mydatabase > \
>> 	/media/backups/mydatabase.$hour.pgsql;
> 
> There are 8 cores on the machine, none of which are reporting more than 5% 
> load. That's what has me perplexed. When I run top, I see a max of about 30% 
> user. Everything else is zero. When I run the backup script to a non-USB 
> drive, the load average is completely normal (below 0.50, often below 0.10) 

USB chewing up more CPU than normal disks has been my experience all along, this just seems a little 
extreme.

> 
>> #2 set the usb-storage on a particular set of processors,
>> # Note USBSTORPID= line prototyped on CentOS 5 machine not 4.
>> USBSTORPID=`ps aux |grep usb-storage|head -1 |awk '{print $2}'`
>> taskset -p -c 3-4 $USBSTORPID
>> #you might even go back and reduce the processor list
>> #to just 3 or 4 instead of both.
> 
> Could you explain to me what this should accomplish? I'm curious as to why you 
> went this route... 

Even though the process is not using much processor time, having it bounce around between processors 
can:
* thrash the cache of each processor as it goes there
* waste time context switching in the next processor
* bounce other processes around and cascade the same effects as they go along

I know that there has been some scheduler work over time to have these switches be less likely, but 
I have also seen some good effects by locking certain processes into a processor instead of letting 
it float.  Usually the best processes to do to are ones that use large amounts of memory, like X or 
Firefox which are large enough that they thoroughly toss anything else out of a processor's cache.

-- 
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter