Benjamin Smith wrote, On 11/17/2009 01:46 PM: > See comments below... > > On Tuesday 17 November 2009 07:52:01 Todd Denniston wrote: >> Benjamin Smith wrote, On 11/16/2009 10:56 PM: >>> I have a 1TB USB drive plugged into a USB2 port that I use to back up the >>> production drives (which are SCSI). It's working fine, but while doing >>> backups (hourly) the load average on the server shoots up from the normal >>> 0.5 - 1.5 or so up to a high between 10 and 30. Strangely, even though >>> the "load is high" the server is completely responsive, even the USB >>> drives being accessed are! >>> >>> Using top to diagnose, nothing seems to be particularly high! IoWait >>> seems reasonable (10-30%) and CPUs are 0.5%, Idle is 70-90%. Even >>> accessing the USB partition while the load is "high" is responsive! >>> > you might add another field to top while you are watching, Last used cpu (SMP), i.e., start top press f press j press enter this should let you see if your process is bouncing between processors. >> As workarounds perhaps asking the kernel to schedule in a specific way >> might help, i.e.: #1 set the backup on a particular set of processors, >> # replace the pg_dump line above with >> taskset -c 3-4 pg_dump <options> mydatabase > \ >> /media/backups/mydatabase.$hour.pgsql; > > There are 8 cores on the machine, none of which are reporting more than 5% > load. That's what has me perplexed. When I run top, I see a max of about 30% > user. Everything else is zero. When I run the backup script to a non-USB > drive, the load average is completely normal (below 0.50, often below 0.10) USB chewing up more CPU than normal disks has been my experience all along, this just seems a little extreme. > >> #2 set the usb-storage on a particular set of processors, >> # Note USBSTORPID= line prototyped on CentOS 5 machine not 4. >> USBSTORPID=`ps aux |grep usb-storage|head -1 |awk '{print $2}'` >> taskset -p -c 3-4 $USBSTORPID >> #you might even go back and reduce the processor list >> #to just 3 or 4 instead of both. > > Could you explain to me what this should accomplish? I'm curious as to why you > went this route... Even though the process is not using much processor time, having it bounce around between processors can: * thrash the cache of each processor as it goes there * waste time context switching in the next processor * bounce other processes around and cascade the same effects as they go along I know that there has been some scheduler work over time to have these switches be less likely, but I have also seen some good effects by locking certain processes into a processor instead of letting it float. Usually the best processes to do to are ones that use large amounts of memory, like X or Firefox which are large enough that they thoroughly toss anything else out of a processor's cache. -- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter