[CentOS] Very unresponsive, sometimes stalling domU (5.4, x86_64)

Tue Mar 2 08:30:50 UTC 2010
Timo Schoeler <timo.schoeler at riscworks.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi list,

please forgive cross posting, but I cannot specify the problem enough to
say whether list it fits perfectly, so I'll ask on both.

I have some machines based with following specs (see at the end of the
email).

They run CentOS 5.4 x86_64 with the latest patches applied, Xen-enabled
and should host one or more domUs. I put the domUs' storage on LVM, as I
learnt ages ago (what never caused any problems) and is way faster than
using file-based 'images'.

However, there's something special about these machines: They have the
new WD EARS series drives, which use 4K sector sizes. So, I booted a
rescue system and used fdisk to start at sector 64 instead of 63 (long
story made short: Due to overhead causing the drive to do much more,
inefficient writes when starting at sector 63, the performance
collapses; with 'normal' geometry (sector 63), the drive achieves about
25MiByte/sec writes, with starting at sector 64 partition, it achieves
almost 100MiByte/sec writes):

[root at server2 ~]# fdisk -ul /dev/sda

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          64     2097223     1048580   fd  Linux raid
autodetect
Partition 1 does not end on cylinder boundary.
/dev/sda2         2097224    18876487     8389632   82  Linux swap / Solaris
/dev/sda3        18876488  1953525167   967324340   fd  Linux raid
autodetect

On top of those (two per machine) WD EARS HDs there's ``md'' providing
two RAID1, /boot and LVM, as well as swap per HD (i.e. non-RAIDed). LVM
provides the / partition as well as LVs for Xen domUs.

I have about 60 machines running that style and never had any problems.
They run like a charm. On these machines, however, domUs are *very*
slow, have a steady (!) load of about two -- 50% stating in 'wait' --
and all operations take ages, e.g. a ``yum update'' with the recently
released updates.

Now, can that be due to 4K issues I didn't see, nestet now in LVM?

Help is very appreciated.

Cheers,

Timo

- ---

Linux server2.blah.org 2.6.18-164.11.1.el5xen #1 SMP Wed Jan 20 08:06:04
EST 2010 x86_64 x86_64 x86_64 GNU/Linux

- ---

[root at server2 ~]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping        : 10
cpu MHz         : 1998.000
cache size      : 3072 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm
bogomips        : 6668.58
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping        : 10
cpu MHz         : 1998.000
cache size      : 3072 KB
physical id     : 1
siblings        : 1
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm
bogomips        : 6668.58
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping        : 10
cpu MHz         : 1998.000
cache size      : 3072 KB
physical id     : 2
siblings        : 1
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm
bogomips        : 6668.58
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz
stepping        : 10
cpu MHz         : 1998.000
cache size      : 3072 KB
physical id     : 3
siblings        : 1
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc
pni monitor ds_cpl vmx smx est tm2 cx16 xtpr lahf_lm
bogomips        : 6668.58
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

- ---

[root at server2 ~]# cat /proc/meminfo
MemTotal:       524288 kB
MemFree:         80620 kB
Buffers:         23352 kB
Cached:         205400 kB
SwapCached:          0 kB
Active:         132448 kB
Inactive:       156424 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       524288 kB
LowFree:         80620 kB
SwapTotal:    16779248 kB
SwapFree:     16779248 kB
Dirty:              32 kB
Writeback:           0 kB
AnonPages:       60112 kB
Mapped:          13348 kB
Slab:            30996 kB
PageTables:       4424 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  17041392 kB
Committed_AS:   334800 kB
VmallocTotal: 34359738367 kB
VmallocUsed:     12572 kB
VmallocChunk: 34359724607 kB

- ---

[root at server2 ~]# fgrep min-mem /etc/xen/xend-config.sxp
# dom0-min-mem is the lowest memory level (in MB) dom0 will get down to.
# If dom0-min-mem=0, dom0 will never balloon out.
(dom0-min-mem 512)

- ---

[root at server2 ~]# fgrep dom0 /boot/grub/menu.lst
        kernel /xen.gz-2.6.18-164.11.1.el5 dom0_mem=512M

- ---

example of ``dstat'' running while ``yum update'' was done; I think the
CPU is in ``wait'' state too much:

- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  0   0 100   0   0   0|   0     0 | 136B  178B|   0     0 |  11    12
  0   0 100   0   0   0|   0     0 | 966B  892B|   0     0 |  35    31
 12   5  83   0   0   0|   0     0 |4068B 4521B|   0     0 | 245   150
 47   4  50   0   0   0|   0     0 | 126B  178B|   0     0 | 113    11
 46   3  51   0   0   0|   0     0 | 328B  470B|   0     0 | 133    22
  0   0  73  28   0   0|  48k    0 | 198B  454B|   0     0 |  30    27
 41   3  51   5   0   0| 192k    0 | 522B 1246B|   0     0 | 164    61
  9   2  89   0   0   0|8192B  968k| 630B 1896B|   0     0 |  62    35
  0   0 100   0   0   0|   0     0 | 136B  178B|   0     0 |  15    16
  0   0 100   0   0   0|   0     0 | 246B  292B|   0     0 |  14    17
  1   0  99   0   0   0|   0     0 |1231k   28k|   0     0 |1004   925
  0   0 100   0   0   0|   0     0 |3394k   77k|   0     0 |2871  2943
 27   5  48  20   0   0| 968k    0 | 442k   10k|   0     0 | 641   657
 19   5  59  17   0   0| 344k  536k|1644B 4064B|   0     0 | 414   339
  0   0  50  50   0   0|  56k  320k| 186B  232B|   0     0 | 128   129
  0   1  44  54   0   0| 136k 1312k| 278B  220B|   0     0 | 126   107
  0   0  55  45   0   0|1552k   11M| 126B  178B|   0     0 | 502   139
  0   0  53  48   0   0| 568k    0 | 126B  178B|   0     0 |  41    32
  0   0  50  50   0   0|   0     0 | 126B  178B|   0     0 |  16    14
  1   1  53  46   0   0|9608k    0 | 258B  566B|   0     0 |1473  2456
 12   3  54  32   0   0|1368k    0 |2112B 6064B|   0     0 | 713   603
 12   1  36  52   0   0| 888k 1192k| 858B 2426B|   0     0 | 394   429
  0   0  52  48   0   0|   0  2472k| 126B  178B|   0     0 | 189    75
- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  0   0  54  46   0   0|   0   728k|  66B  178B|   0     0 | 107    30
  0   0  41  59   0   0|8192B  448k| 126B  322B|   0     0 |  85    46
  0   0  55  45   0   0|   0  1920k| 126B  178B|   0     0 | 185    72
  0   0  54  46   0   0|   0  2688k| 126B  178B|   0     0 | 238    78
  0   0  41  59   0   0|   0  1576k| 126B  178B|   0     0 | 136    51
  0   0  47  53   0   0|8192B 2128k|  66B  178B|   0     0 | 207    53
  0   0  50  50   0   0|  40k 2744k|  66B  178B|   0     0 | 277    60
  0   0  50  50   0   0|  16k 3536k|  66B  178B|   0     0 | 330    59
  0   0  50  50   0   0|8192B 1016k|  66B  178B|   0     0 |  98    16
  0   0  59  41   0   0|  80k 4320k|  66B  178B|   0     0 | 108   100
  0   0  48  52   0   0|  16k  208k| 126B  178B|   0     0 |  80    89
  0   0  46  54   0   0|  56k    0 | 308B  178B|   0     0 |  38    68
  0   0  42  58   0   0|   0     0 |  66B  178B|   0     0 |  11    11
  0   0  54  45   0   0|1264k  752k|  66B  178B|   0     0 | 282   428
  0   0  53  47   0   0|   0   360k|  66B  178B|   0     0 |  81   101
  0   0  53  46   0   0|   0   576k|  66B  178B|   0     0 | 141   129
  0   0  38  62   0   0| 536k   88k| 126B  178B|   0     0 |  68    62
  0   0  50  50   0   0|   0     0 |  66B  178B|   0     0 |  17    15
  1   1  52  46   0   0| 336k  392k| 126B  178B|   0     0 | 105   115
  0   0  39  61   0   0| 152k 3504k| 126B  178B|   0     0 | 199    63
  0   0  49  51   0   0|  40k  992k| 186B  178B|   0     0 | 122    40
  0   0  56  44   0   0|   0   216k| 186B  178B|   0     0 |  73    39
  0   0  42  58   0   0|   0   224k|  66B  178B|   0     0 |  69    30
- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  0   0  50  50   0   0|   0   216k|  66B  178B|   0     0 |  89    36
  0   0  51  50   0   0|   0   272k| 126B  322B|   0     0 | 100    34
  0   0  50  50   0   0|8192B  312k|  66B  178B|   0     0 |  81    36
  0   0  56  44   0   0|   0   560k| 186B  178B|   0     0 | 103    44
  0   0  43  57   0   0|   0   488k| 126B  178B|   0     0 |  91    16
  0   0  50  50   0   0|   0   408k| 126B  178B|   0     0 |  59    12
  0   0  64  36   0   0|  72k  120k| 380B  566B|   0     0 | 140    87
  0   0  44  56   0   0|   0     0 |  66B  178B|   0     0 |   9    10
  0   0  55  45   0   0|  16k    0 | 126B  178B|   0     0 |  32    38
  2   0  21  78   0   0|  72k  744k| 846B 2038B|   0     0 | 243   275
  0   0  44  56   0   0|   0     0 | 186B  178B|   0     0 |  15    19
  0   0  50  50   0   0|8192B  992k|  66B  178B|   0     0 |  72    16
  0   1  51  48   0   0| 440k  568k| 126B  178B|   0     0 | 105   142
  0   0  65  34   0   0|  32k   48k| 192B  372B|   0     0 |  41    43
  2   1  41  56   0   0|  80k  872k|1254B 3784B|   0     0 | 271   264
  0   0  44  56   0   0|   0   264k| 126B  178B|   0     0 |  66    84
  0   0  61  39   0   0|  64k  224k| 126B  178B|   0     0 | 125   162
  1   1  18  81   0   0|   0   736k| 132B  372B|   0     0 |  88    35
  0   0  59  41   0   0| 104k  912k|1032B 2312B|   0     0 | 176   113
  0   0  44  56   0   0|   0     0 |1090B  178B|   0     0 |  15    13
  1   0  57  41   0   0|  96k  704k| 528B 1456B|   0     0 | 167   206
  0   0  44  56   0   0|  40k   16k|1270B  178B|   0     0 |  39    36
  0   0  50  50   0   0|   0     0 | 126B  178B|   0     0 |  13    13
- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  0   0  50  50   0   0|  16k 2368k|  66B  178B|   0     0 | 165    50
  1   0  54  45   0   0|  72k  304k|1692B 4510B|   0     0 | 248   201
  0   0  54  46   0   0|8192B  176k|3898B  178B|   0     0 | 132    83
  0   0  51  49   0   0|8192B  240k|7236B  252B|   0     0 | 212   101
  0   0  50  50   0   0|  40k    0 | 126B  178B|   0     0 |  27    31
  0   0  32  68   0   0|   0     0 |3034B  178B|   0     0 |  61    15
  0   3  63  34   0   0| 280k 1840k|6350B  820B|   0     0 | 378   354
  0   0  44  56   0   0|   0   336k|  66B  178B|   0     0 |  73    88
  0   0  50  50   0   0|8192B  248k|  66B  178B|   0     0 |  62    52
  0   0  50  50   0   0| 336k  200k| 126B  178B|   0     0 |  65    71
  0   0  55  45   0   0|  72k  368k| 126B  178B|   0     0 |  80   100
  0   0  52  48   0   0| 192k  176k|  66B  178B|   0     0 |  54    69
  0   0  41  59   0   0| 112k  272k|  66B  178B|   0     0 |  71    64
  0   0  40  60   0   0|   0     0 | 126B  178B|   0     0 |  25    24
  0   0  57  43   0   0| 240k  216k| 186B  330B|   0     0 |  63    86
  0   0  51  49   0   0| 120k  808k| 126B  178B|   0     0 | 131   157
  0   0  50  50   0   0|   0   296k|  66B  178B|   0     0 |  65    84
  0   0  50  50   0   0|   0   296k| 126B  178B|   0     0 |  72    98
  0   0  41  59   0   0|   0  2848k| 126B  178B|   0     0 | 154   112
  0   0  50  50   0   0|   0   384k| 188B  178B|   0     0 |  84    32
  0   0  53  47   0   0|   0   272k|1812B  178B|   0     0 |  82    17
  0   0  47  53   0   0|   0   208k| 196B  178B|   0     0 |  54    17
  0   0  50  50   0   0|   0   232k| 128B  178B|   0     0 |  66    30
- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  0   0  45  55   0   0|   0   392k| 126B  178B|   0     0 |  85    22
  0   0  50  50   0   0|   0   296k| 126B  322B|   0     0 |  64    10
  0   0  65  35   0   0|  24k  472k| 448B  486B|   0     0 | 114    99
  1   1  20  78   0   0| 120k    0 |  66B  178B|   0     0 |  86    96
  0   0  49  51   0   0|   0     0 |  66B  178B|   0     0 |   9    12
  0   1  64  36   0   0|2720k  456k| 276B  178B|   0     0 | 151   124
  0   0 100   0   0   0|   0     0 | 196B  178B|   0     0 |  32    24
  0   0 100   0   0   0|   0     0 |  66B  178B|   0     0 |   9    12
  0   0 100   0   0   0|   0     0 |  66B  178B|   0     0 |  10    13
  0   0  80  19   0   0|8192B 1248k|  66B  178B|   0     0 | 177   205
  0   0  54  46   0   0|  16k  328k| 198B  486B|   0     0 |  99   111
  0   0  53  47   0   0|   0   344k| 126B  178B|   0     0 |  82    78
  0   0  31  69   0   0|   0   312k|  66B  178B|   0     0 |  79   101
  0   0  60  40   0   0|   0   240k| 192B  372B|   0     0 |  76    89
  0   0  24  76   0   0|   0   280k|  66B  178B|   0     0 |  71    81
  0   0  47  53   0   0|   0   120k|  66B  178B|   0     0 |  36    40
  0   0  50  50   0   0|   0   304k|  66B  178B|   0     0 |  79    99
  0   0  56  44   0   0|   0   144k|  66B  178B|   0     0 |  42    84
  0   0  50  50   0   0|  16k  136k| 186B  178B|   0     0 |  50    64
  0   0  59  41   0   0|   0   328k| 192B  372B|   0     0 |  71    57
  0   0  22  78   0   0|   0   280k|  66B  178B|   0     0 |  71    88
  0   0  50  50   0   0|   0   256k|  66B  178B|   0     0 |  76    88
  0   0  56  44   0   0|  24k  232k| 186B  178B|   0     0 |  80   102
- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  0   0  52  48   0   0|   0   240k| 132B  372B|   0     0 |  76    84
  0   0  60  40   0   0|   0   416k| 186B  322B|   0     0 |  73    81
  3   2  55  41   0   0|8192B  184k| 384B 1316B|   0     0 |  97    80
  0   0 100   0   0   0|   0     0 | 428B  746B|   0     0 |  40    22
  0   0 100   0   0   0|   0     0 | 246B  462B|   0     0 |  24    13
  0   0  98   2   0   0|   0  2728k|  66B  178B|   0     0 |  22    21
  0   0 100   0   0   0|   0     0 | 308B  462B|   0     0 |  23    11
  0   0 100   0   0   0|   0     0 |  66B  178B|   0     0 |  23    17
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  12    11
  0   0 100   0   0   0|   0     0 | 246B  462B|   0     0 |  23    15
  0   0  99   1   0   0|   0     0 |  66B  178B|   0     0 |  10    14
  0   0  99   1   0   0|   0   688k| 126B  178B|   0     0 | 110    15
  0   0 100   0   0   0|   0     0 | 248B  178B|   0     0 |  21    15
  0   0 100   0   0   0|   0     0 |  66B  178B|   0     0 |  10    13
  0   0 100   0   0   0|   0     0 | 186B  178B|   0     0 |  21    13
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  14    17
  0   0 100   0   0   0|   0     0 | 186B  462B|   0     0 |  20    11
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  22    19
  0   0 100   0   0   0|   0     0 | 186B  178B|   0     0 |  17    13
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  12    14
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  13    13
  0   0 100   0   0   0|   0     0 | 186B  178B|   0     0 |  13    12
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  11    11
- ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  0   0  78  23   0   0|   0    32k| 126B  178B|   0     0 |  20    21
  0   0 100   0   0   0|   0     0 | 126B  322B|   0     0 |  12    13
  0   0 100   0   0   0|   0     0 |  66B  178B|   0     0 |   9    12
  0   0 100   0   0   0|   0     0 |  66B  178B|   0     0 |  10    13
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  15    15
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  13    11
  0   0 100   0   0   0|   0     0 | 126B  314B|   0     0 |  16    15
  0   0 100   0   0   0|   0     0 | 186B  326B|   0     0 |  16    11
  0   0 100   0   0   0|   0   112k| 126B  178B|   0     0 |  22    15
  0   0 100   0   0   0|   0     0 | 186B  178B|   0     0 |  14    14
  0   0 100   0   0   0|   0     0 |  66B  178B|   0     0 |  12    12
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  14    13
  0   0 100   0   0   0|   0     0 |  66B  178B|   0     0 |  10    13
  0   0  99   2   0   0|   0  8192B|  66B  178B|   0     0 |  12    15
  0   0 100   0   0   0|   0     0 | 186B  178B|   0     0 |  21    14
  0   0 100   0   0   0|   0     0 | 126B  178B|   0     0 |  15    13
  0   0 100   0   0   0|   0     0 | 186B  178B|   0     0 |  16    15
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFLjMy5fg746kcGBOwRAs6KAJ9BeSRkvsIwt3/z/KYQcW6fIKxkHgCgjFaH
GDjMz//WUy7m2EAeD27HYpw=
=oK05
-----END PGP SIGNATURE-----