[CentOS] Very unresponsive, sometimes stalling domU (5.4, x86_64)

Tue Mar 2 09:18:02 UTC 2010
Pasi Kärkkäinen <pasik at iki.fi>

On Tue, Mar 02, 2010 at 09:30:50AM +0100, Timo Schoeler wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi list,
> 
> please forgive cross posting, but I cannot specify the problem enough to
> say whether list it fits perfectly, so I'll ask on both.
> 
> I have some machines based with following specs (see at the end of the
> email).
> 
> They run CentOS 5.4 x86_64 with the latest patches applied, Xen-enabled
> and should host one or more domUs. I put the domUs' storage on LVM, as I
> learnt ages ago (what never caused any problems) and is way faster than
> using file-based 'images'.
> 
> However, there's something special about these machines: They have the
> new WD EARS series drives, which use 4K sector sizes. So, I booted a
> rescue system and used fdisk to start at sector 64 instead of 63 (long
> story made short: Due to overhead causing the drive to do much more,
> inefficient writes when starting at sector 63, the performance
> collapses; with 'normal' geometry (sector 63), the drive achieves about
> 25MiByte/sec writes, with starting at sector 64 partition, it achieves
> almost 100MiByte/sec writes):
> 
> [root at server2 ~]# fdisk -ul /dev/sda
> 
> Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1   *          64     2097223     1048580   fd  Linux raid
> autodetect
> Partition 1 does not end on cylinder boundary.
> /dev/sda2         2097224    18876487     8389632   82  Linux swap / Solaris
> /dev/sda3        18876488  1953525167   967324340   fd  Linux raid
> autodetect
> 
> On top of those (two per machine) WD EARS HDs there's ``md'' providing
> two RAID1, /boot and LVM, as well as swap per HD (i.e. non-RAIDed). LVM
> provides the / partition as well as LVs for Xen domUs.
> 
> I have about 60 machines running that style and never had any problems.
> They run like a charm. On these machines, however, domUs are *very*
> slow, have a steady (!) load of about two -- 50% stating in 'wait' --
> and all operations take ages, e.g. a ``yum update'' with the recently
> released updates.
> 
> Now, can that be due to 4K issues I didn't see, nestet now in LVM?
> 
> Help is very appreciated.
> 

Maybe the default LVM alignment is wrong for these drives.. 
did you check/verify that? 

See:
http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/

Especially the "--metadatasize" option.

-- Pasi