[CentOS] DRBD NFS load issues

Sun Jan 6 16:39:50 UTC 2008
Jed Reynolds <lists at bitratchet.com>

My NFS setup is a heartbeat setup on two servers running Active/Passive
DRBD. The NFS servers themselves are 1x 2 core Opterons with 8G ram and
5TB space with 16 drives and a 3ware controller. They're connected to a
HP procurve switch with bonded ethernet. The sync-rates between the two
DRBD nodes seem to safely reach 200Mbps or better. The processors on the
active NFS servers run with a load of 0.2, so it seems mighty healthy.
Until I do a serious backup.

I have a few load balanced web nodes and two database nodes as NFS
clients. When I start backing up my database to a mounted NFS partition,
a plain rsync drives the NFS box through the roof and forces a failover.
I can do my backup using --bwlimit=1500, but then I'm not anywhere close
to a fast  backup, just 1.5MBps. My backups are probably 40G. (The
database has fast disks and between database copies I see run at up to
60MBps - close to 500Mbps). I obviously do not have a networking issue.

The processor loads up like this:
bwlimit   1500   load    2.3
bwlimit   2500   load   3.5
bwlimit   4500   load   5.5+

The DRBD secondary seems to run at about 1/2 the load of the primary.

What I'm wondering is--why is this thing *so* load sensitive? Is it
DRBD? Is it NFS? I'm guessing that since I only have two cores in the
NFS boxes that a prolonged transfer makes NFS dominates 1 core and DRBD
dominate the next, and so I'm saturating my processor.

Thots?

Jed