Logrotate/cron and major I/O contention with KVM. - virt

10 Mar 2010


      Is anyone else having major I/O peaks due to logrotate or other jobs
running simultaneously across multiple guests. I have one KVM server
running Centos 5.4 with local disk that is seriously suffering as most
of the guests rotate their syslog at the same time.
Looking at the KVM server I'm seeing
11:00:01 PM       CPU     %user     %nice   %system   %iowait   
%steal     %idle
03:40:01 AM       all      0.07      0.00      2.74      0.93     
0.00     96.26
03:50:01 AM       all      0.07      0.00      1.17      1.18     
0.00     97.58
04:00:01 AM       all      0.08      0.00      1.51      0.82     
0.00     97.59
04:10:02 AM       all      0.53      0.03     15.31     51.61     
0.00     32.53
04:20:01 AM       all      0.28      0.12      4.12     22.21     
0.00     73.27
04:30:01 AM       all      0.07      0.00      0.80      1.21     
0.00     97.92
04:40:01 AM       all      0.07      0.00      2.60      1.81     
0.00     95.52
04:50:01 AM       all      0.08      0.00      0.79      1.44     
0.00     97.69
On one of the guests running Centos 4.6 the impact is so bad I get DMA
timeout errors in the syslog, and occasional kernel panics.
Mar 11 04:05:04 localhost kernel: hda: dma_timer_expiry: dma status == 0x21
Mar 11 04:05:14 localhost kernel: hda: DMA timeout error
Mar 11 04:05:14 localhost kernel: hda: dma timeout error: status=0x50 {
DriveReady SeekComplete }
Mar 11 04:05:14 localhost kernel:
Mar 11 04:05:14 localhost kernel: ide: failed opcode was: unknown
Mar 11 04:05:59 localhost kernel: hda: dma_timer_expiry: dma status == 0x21
Mar 11 04:06:14 localhost kernel: hda: DMA timeout error
Mar 11 04:06:14 localhost kernel: hda: dma timeout error: status=0x50 {
DriveReady SeekComplete }
One reference I've found is at
 * http://lonesysadmin.net/linux-virtual-machine-tuning-guide/
This suggests avoiding running scheduled jobs simultaneously across
guests, and suggests using a random sleep.
Does anyone else have suggestions on reducing the impact of cron/logrotate.
-- 
*Steven Ellis - Director of Worldwide Engineering,*
*Bulletin.Net Inc* - http://www.bulletin.net/