[CentOS] home directory server performance issues

Mon Dec 10 17:37:50 UTC 2012
Matt Garman <matthew.garman at gmail.com>

I’m looking for advice and considerations on how to optimally setup
and deploy an NFS-based home directory server.  In particular: (1) how
to determine hardware requirements, and (2) how to best setup and
configure the server.  We actually have a system in place, but the
performance is pretty bad---the users often experience a fair amount
of lag (1--5 seconds) when doing anything on their home directories,
including an “ls” or writing a small text file.

So now I’m trying to back-up and determine, is it simply a
configuration issue, or is the hardware inadequate?

Our scenario: we have about 25 users, mostly software developers and
analysts.  The users login to one or more of about 40 development
servers.  All users’ home directories live on a single server (no
login except root); that server does an NFSv4 export which is mounted
by all dev servers.  The home directory server hardware is a Dell R510
with dual E5620 CPUs and 8 GB RAM.  There are eight 15k 2.5” 600 GB
drives (Seagate ST3600057SS) configured in hardware RAID-6 with a
single hot spare.  RAID controller is a Dell PERC H700 w/512MB cache
(Linux sees this as a LSI MegaSAS 9260).  OS is CentOS 5.6, home
directory partition is ext3, with options “rw,data=journal,usrquota”.

I have the HW RAID configured to present two virtual disks to the OS:
/dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for
the home directories.  I’m fairly certain I did not align the
partitions optimally:

[root at lnxutil1 ~]# parted -s /dev/sda unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sda: 134217599s
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start    End         Size        Type     File system  Flags
 1      63s      465884s     465822s     primary  ext2         boot
 2      465885s  134207009s  133741125s  primary               lvm

[root at lnxutil1 ~]# parted -s /dev/sdb unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sdb: 5720768639s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name  Flags
 1      34s    5720768606s  5720768573s                     lvm


Can anyone confirm that the partitions are not aligned correctly, as I
suspect?  If this is true, is there any way to *quantify* the effects
of partition mis-alignment on performance?  In other words, what kind
of improvement could I expect if I rebuilt this server with the
partitions aligned optimally?

In general, what is the best way to determine the source of our
performance issues?  Right now, I’m running “iostat -dkxt 30”
re-directed to a file.  I intend to let this run for a day or so, and
write a script to produce some statistics.

Here is one iteration from the iostat process:

Time: 09:37:28 AM
Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00    44.09  0.03 107.76     0.13   607.40
11.27     0.89    8.27   7.27  78.35
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda2              0.00    44.09  0.03 107.76     0.13   607.40
11.27     0.89    8.27   7.27  78.35
sdb               0.00  2616.53  0.67 157.88     2.80 11098.83
140.04     8.57   54.08   4.21  66.68
sdb1              0.00  2616.53  0.67 157.88     2.80 11098.83
140.04     8.57   54.08   4.21  66.68
dm-0              0.00     0.00  0.03 151.82     0.13   607.26
8.00     1.25    8.23   5.16  78.35
dm-1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  0.67 2774.84     2.80 11099.37
8.00   474.30  170.89   0.24  66.84
dm-3              0.00     0.00  0.67 2774.84     2.80 11099.37
8.00   474.30  170.89   0.24  66.84


What I observe, is that whenever sdb (home directory partition)
becomes loaded, sda (OS) often does as well.  Why is this?  I would
expect sda to generally be idle, or have minimal utilization.
According to both “free” and “vmstat”, this server is not swapping at
all.

At one point, our problems were due to a random user writing a huge
file to their home directory.  We built a second server specifically
for people to use for writing large temporary files.  Furthermore, for
all the dev servers, I used the following tc commands to rate limit
how quickly any one server can write to the home directory server (8
Mbps or 1 MB/s):

ETH_IFACE=$( route -n | grep "^0.0.0.0" | awk '{ print $8 }' )
IFACE_RATE=1000mbit
LIMIT_RATE=8mbit
TARGET_IP=1.2.3.4 # home directory server IP
tc qdisc add dev $ETH_IFACE root handle 1: htb default 1
tc class add dev $ETH_IFACE parent 1: classid 1:1 htb rate $IFACE_RATE
ceil $IFACE_RATE
tc class add dev $ETH_IFACE parent 1: classid 1:2 htb rate $LIMIT_RATE
ceil $LIMIT_RATE
tc filter add dev $ETH_IFACE parent 1: protocol ip prio 16 u32 match
ip dst $TARGET_IP flowid 1:2

The other interesting thing is that the second server I mentioned—the
one specifically designed for users to “torture”—shows very low IO
utilization, practically never going above 10%.  That server is fairly
different though: dual E5-2340 CPUs (more cores, but lower clock), 32
GB RAM.  Disk subsystem is Dell PERC 710 (LSI MegaRAID SAS 2208), and
drives are 7200 RPM 1GB (SEAGATE ST1000NM0001) in RAID-6.  The OS is
CentOS 6.3, NFS partition is ext4 with options
“rw,relatime,barrier=1,data=ordered,usrquota”.

Ultimately, I plan to rebuild the home directory server with CentOS 6
(instead of 5), and align the partitions properly.  But as of now, I
don’t have a rational reason for doing that other than the other
server with this config doesn’t have performance problems.  I’d like
to be able to say specifically (i.e. quantify) exactly where the
problems are and how they will be addressed by the upgrade/config
change.

I’ll add that we want to use the “sec=krb5p” (i.e. encrypt everything)
mount option for the home directories.  We tried that with the home
directory server, and it became virtually unusable.  But we use that
option on the other server, with no issue.  For now, as a stop-gap, we
are just using the “sec=krb5” mount option (i.e., no encryption).  The
server is still laggy, but at least usable.

Here is the output of “nfsstat –v” on the home directory server:
[root at lnxutil1 ~]# nfsstat -v
Server packet stats:
packets    udp        tcp        tcpconn
12560989   0          12544002   17146

Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
12516995   922        0          922        0

Server reply cache:
hits       misses     nocache
0          0          12512316

Server file handle cache:
lookup     anon       ncachedir  ncachedir  stale
0          0          0          0          160

Server nfs v4:
null         compound
86        0% 12516096 99%

Server nfs v4 operations:
op0-unused   op1-unused   op2-future   access       close        commit
0         0% 0         0% 0         0% 449630    1% 1131528   2% 191998    0%
create       delegpurge   delegreturn  getattr      getfh        link
2053      0% 0         0% 62931     0% 11210081 29% 1638995   4% 275       0%
lock         lockt        locku        lookup       lookup_root  nverify
196       0% 0         0% 196       0% 557606    1% 0         0% 0         0%
open         openattr     open_conf    open_dgrd    putfh        putpubfh
1274780   3% 0         0% 72561     0% 618       0% 12357089 32% 0         0%
putrootfh    read         readdir      readlink     remove       rename
160       0% 1548999   4% 44760     0% 625       0% 140946    0% 4229      0%
renew        restorefh    savefh       secinfo      setattr      setcltid
134103    0% 1157086   3% 1281276   3% 0         0% 133212    0% 143       0%
setcltidconf verify       write        rellockowner
113       0% 0         0% 4896102  12% 196       0%


Let me know if I can provide any more useful information.  Thanks in
advance for any pointers!