[CentOS] [CentOS-virt] Very unresponsive, sometimes stalling domU (5.4, x86_64)

Wed Mar 3 09:20:52 UTC 2010
Timo Schoeler <timo.schoeler at riscworks.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

thus Pasi Kärkkäinen spake:
> On Tue, Mar 02, 2010 at 09:30:50AM +0100, Timo Schoeler wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi list,
>>
>> please forgive cross posting, but I cannot specify the problem enough to
>> say whether list it fits perfectly, so I'll ask on both.
>>
>> I have some machines based with following specs (see at the end of the
>> email).
>>
>> They run CentOS 5.4 x86_64 with the latest patches applied, Xen-enabled
>> and should host one or more domUs. I put the domUs' storage on LVM, as I
>> learnt ages ago (what never caused any problems) and is way faster than
>> using file-based 'images'.
>>
>> However, there's something special about these machines: They have the
>> new WD EARS series drives, which use 4K sector sizes. So, I booted a
>> rescue system and used fdisk to start at sector 64 instead of 63 (long
>> story made short: Due to overhead causing the drive to do much more,
>> inefficient writes when starting at sector 63, the performance
>> collapses; with 'normal' geometry (sector 63), the drive achieves about
>> 25MiByte/sec writes, with starting at sector 64 partition, it achieves
>> almost 100MiByte/sec writes):
>>
>> [root at server2 ~]# fdisk -ul /dev/sda
>>
>> Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
>> 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>>
>>    Device Boot      Start         End      Blocks   Id  System
>> /dev/sda1   *          64     2097223     1048580   fd  Linux raid
>> autodetect
>> Partition 1 does not end on cylinder boundary.
>> /dev/sda2         2097224    18876487     8389632   82  Linux swap / Solaris
>> /dev/sda3        18876488  1953525167   967324340   fd  Linux raid
>> autodetect
>>
>> On top of those (two per machine) WD EARS HDs there's ``md'' providing
>> two RAID1, /boot and LVM, as well as swap per HD (i.e. non-RAIDed). LVM
>> provides the / partition as well as LVs for Xen domUs.
>>
>> I have about 60 machines running that style and never had any problems.
>> They run like a charm. On these machines, however, domUs are *very*
>> slow, have a steady (!) load of about two -- 50% stating in 'wait' --
>> and all operations take ages, e.g. a ``yum update'' with the recently
>> released updates.
>>
>> Now, can that be due to 4K issues I didn't see, nestet now in LVM?
>>
>> Help is very appreciated.
>>
> 
> Maybe the default LVM alignment is wrong for these drives.. 
> did you check/verify that? 
> 
> See:
> http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/
> 
> Especially the "--metadatasize" option.

Hi Pasi, hey lists,

thanks for the hint. Following is the 'most important' part of the text:

``So I created a 1 gigabyte /boot partition as /dev/sdb1, and allocated
the rest of the SSD for use by LVM as /dev/sdb2. And that’s where I ran
into my next problem. LVM likes to allocate 192k for its header
information, and 192k is not a multiple of 128k. So if you are creating
file systems as logical volumes, and you want those volume to be
properly aligned you have to tell LVM that it should reserve slightly
more space for its meta-data, so that the physical extents that it
allocates for its logical volumes are properly aligned. Unfortunately,
the way this is done is slightly baroque:

# pvcreate –metadatasize 250k /dev/sdb2
Physical volume “/dev/sdb2″ successfully created

Why 250k and not 256k? I can’t tell you — sometimes the LVM tools aren’t
terribly intuitive. However, you can test to make sure that physical
extents start at the proper offset by using:

# pvs /dev/sdb2 -o+pe_start
PV         VG   Fmt  Attr PSize  PFree  1st PE
/dev/sdb2       lvm2 –   73.52G 73.52G 256.00K

If you use a metadata size of 256k, the first PE will be at 320k instead
of 256k. There really ought to be an –pe-align option to pvcreate, which
would be far more user-friendly, but, we have to work with the tools
that we have. Maybe in the next version of the LVM support tools….''

So, after taking care of starting at sector 64 *and* taking care
``pvcreate'' has its 'multiple of 128k', I still have the same problem.

Most interestingly, debian 'lenny' does *not* have this problem. LVM's
PV does *not* have to be like mentioned above.

So, unfortunately, it seems like I'm forced to use debian in this
project, at least on a few machines. *shiver*

> -- Pasi

Timo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFLjin0fg746kcGBOwRAvp0AKC7TuCnrK63MOiqI8CK+m+XNgDqFgCfRvq+
DjcZJN8mCweY6jvAvTb90hg=
=+E/H
-----END PGP SIGNATURE-----