[CentOS] home directory server performance issues

Wed Dec 12 06:29:17 UTC 2012
Gordon Messmer <yinyang at eburg.com>

On 12/10/2012 09:37 AM, Matt Garman wrote:
> In particular: (1) how
> to determine hardware requirements

That may be difficult at this point, because you really want to start by 
measuring the number of IOPS.  That's difficult to do if your 
applications demand more than your hardware currently provices.

> -the users often experience a fair amount
> of lag (1--5 seconds) when doing anything on their home directories,
> including an “ls” or writing a small text file.

This might not be the result of your NFS server performance.  You might 
actually be seeing bad performance in your directory service.  What are 
you using for that service?  LDAP?  NIS?  Are you running nscd or sssd 
on the clients?

> There are eight 15k 2.5” 600 GB
> drives (Seagate ST3600057SS) configured in hardware RAID-6 with a
> single hot spare.  RAID controller is a Dell PERC H700 w/512MB cache
> (Linux sees this as a LSI MegaSAS 9260).

RAID 6 is good for $/GB, but bad for performance.  If you find that your 
performance is bad, RAID10 will offer you a lot more IOPS.

Mixing 15k drives with RAID-6 is probably unusual.  Typically 15k drives 
are used when the system needs maximum IOPS, and RAID-6 is used when 
storage capacity is more important than performance.

It's also unusual to see a RAID-6 array with a hot spare.  You already 
have two disks of parity.  At this point, your available storage 
capacity is only 600GB greater than a RAID-10 configuration, but your 
performance is MUCH worse.

> OS is CentOS 5.6, home
> directory partition is ext3, with options “rw,data=journal,usrquota”.

data=journal actually offers better performance than the default in some 
workloads, but not all.  You should try the default and see which is 
better.  With a hardware RAID controller that has battery backed write 
cache, data=journal should not perform any better than the default, but 
probably not any worse.

> I have the HW RAID configured to present two virtual disks to the OS:
> /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for
> the home directories.  I’m fairly certain I did not align the
> partitions optimally:

If your drives are really 4k sectors, rather than the reported 512B, 
then they're not optimal and writes will suffer.  The best policy is to 
start your first partition at 1M offset.  parted should be aligning 
things well if it's updated, but if your partition sizes (in sectors) 
are divisible by 8, you should be in good shape.

> Here is one iteration from the iostat process:
>
> Time: 09:37:28 AM
> Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00    44.09  0.03 107.76     0.13   607.40
> 11.27     0.89    8.27   7.27  78.35
> sdb               0.00  2616.53  0.67 157.88     2.80 11098.83
> 140.04     8.57   54.08   4.21  66.68

If that's normal, you need a faster array configuration.  That iteration 
caught both disks with a very high % of maximum utilization.  Consider 
using RAID-10.

> What I observe, is that whenever sdb (home directory partition)
> becomes loaded, sda (OS) often does as well.  Why is this?

Regardless of what you export to the OS, if the RAID controller really 
only has one big RAID-6 array, you'd expect saturation of either OS disk 
to affect both.